All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/18] libsas, sas_ata: update for 3.5
@ 2012-05-06 18:17 Dan Williams
  2012-05-06 18:18 ` [PATCH 01/18] libsas: cleanup spurious calls to scsi_schedule_eh Dan Williams
                   ` (18 more replies)
  0 siblings, 19 replies; 21+ messages in thread
From: Dan Williams @ 2012-05-06 18:17 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

1/ libsas suspend/resume support

2/ finish off the conversion of the strategy handlers to enforce only
   invoking them in eh context (to meet libata's expectations).

3/ a collection of discovery fixes, error handling fixes, and
   miscellaneous cleanups

All of these patches, save for patch 18, are on their 2nd or 3rd resend.
These have been soaking in -next since before the 3.4 merge window
opened.

[PATCH 01/18] libsas: cleanup spurious calls to scsi_schedule_eh
[PATCH 02/18] libata, libsas: introduce sched_eh and end_eh port ops
[PATCH 03/18] scsi: fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations)
[PATCH 04/18] scsi_transport_sas: fix delete vs scan race
[PATCH 05/18] libsas: enforce eh strategy handlers only in eh context
[PATCH 06/18] libsas: add sas_eh_abort_handler
[PATCH 07/18] libsas: use ->lldd_I_T_nexus_reset for ->eh_bus_reset_handler
[PATCH 08/18] isci: use sas eh strategy handlers
[PATCH 09/18] libsas: trim sas_task of slow path infrastructure
[PATCH 10/18] libsas: sas_rediscover_dev did not look at the SMP exec status.
[PATCH 11/18] mvsas: remove unused variable in mvs_task_exec()
[PATCH 12/18] libata: reset once
[PATCH 13/18] libsas: continue revalidation
[PATCH 14/18] libata: export ata_port suspend/resume infrastructure for sas
[PATCH 15/18] libsas: drop sata port multiplier infrastructure
[PATCH 16/18] scsi, sd: limit the scope of the async probe domain
[PATCH 17/18] libsas: suspend / resume support
[PATCH 18/18] scsi: cleanup setting task state in scsi_error_handler()

 Documentation/kernel-parameters.txt |    3 +
 drivers/ata/libata-core.c           |   63 +++++++++--
 drivers/ata/libata-eh.c             |   59 +++++++++--
 drivers/scsi/isci/init.c            |    3 +
 drivers/scsi/libsas/sas_ata.c       |  125 ++++++++++++++++++++++
 drivers/scsi/libsas/sas_discover.c  |   92 +++++++++++++----
 drivers/scsi/libsas/sas_dump.c      |    1 
 drivers/scsi/libsas/sas_event.c     |   16 +--
 drivers/scsi/libsas/sas_expander.c  |   35 ++++--
 drivers/scsi/libsas/sas_init.c      |  129 +++++++++++++++++++++--
 drivers/scsi/libsas/sas_internal.h  |    1 
 drivers/scsi/libsas/sas_phy.c       |   21 ++++
 drivers/scsi/libsas/sas_port.c      |   52 +++++++++
 drivers/scsi/libsas/sas_scsi_host.c |  195 +++++++++++++++++++++++++++++++----
 drivers/scsi/mvsas/mv_sas.c         |   21 ++--
 drivers/scsi/pm8001/pm8001_sas.c    |   37 +++----
 drivers/scsi/scsi.c                 |    4 +
 drivers/scsi/scsi_error.c           |   19 +++
 drivers/scsi/scsi_lib.c             |   10 +-
 drivers/scsi/scsi_pm.c              |    2 
 drivers/scsi/scsi_priv.h            |    2 
 drivers/scsi/scsi_transport_sas.c   |    6 +
 drivers/scsi/sd.c                   |    5 +
 include/linux/libata.h              |   16 +++
 include/scsi/libsas.h               |   50 +++++++--
 include/scsi/sas_ata.h              |   15 +++
 26 files changed, 827 insertions(+), 155 deletions(-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 01/18] libsas: cleanup spurious calls to scsi_schedule_eh
  2012-05-06 18:17 [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
@ 2012-05-06 18:18 ` Dan Williams
  2012-05-06 18:18 ` [PATCH 02/18] libata, libsas: introduce sched_eh and end_eh port ops Dan Williams
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2012-05-06 18:18 UTC (permalink / raw)
  To: linux-scsi; +Cc: Maciej Trela, linux-ide, Artur Wojcik, Jacek Danecki

From: Maciej Trela <maciej.trela@intel.com>

eh is woken up automatically by the presence of failed commands,
scsi_schedule_eh is reserved for cases where there are no failed
commands.  This guarantees that host_eh_sceduled is only incremented
when an explicit eh request is made.

Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Maciej Trela <maciej.trela@intel.com>
[fixed spurious delete of sas_ata_task_abort]
Signed-off-by: Artur Wojcik <artur.wojcik@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_ata.c       |    1 -
 drivers/scsi/libsas/sas_scsi_host.c |    1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 441d88a..6c31329 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -591,7 +591,6 @@ void sas_ata_task_abort(struct sas_task *task)
 		spin_lock_irqsave(q->queue_lock, flags);
 		blk_abort_request(qc->scsicmd->request);
 		spin_unlock_irqrestore(q->queue_lock, flags);
-		scsi_schedule_eh(qc->scsicmd->device->host);
 		return;
 	}
 
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index f0b9b7b..17339e5 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -1003,7 +1003,6 @@ void sas_task_abort(struct sas_task *task)
 		spin_lock_irqsave(q->queue_lock, flags);
 		blk_abort_request(sc->request);
 		spin_unlock_irqrestore(q->queue_lock, flags);
-		scsi_schedule_eh(sc->device->host);
 	}
 }
 


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 02/18] libata, libsas: introduce sched_eh and end_eh port ops
  2012-05-06 18:17 [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
  2012-05-06 18:18 ` [PATCH 01/18] libsas: cleanup spurious calls to scsi_schedule_eh Dan Williams
@ 2012-05-06 18:18 ` Dan Williams
  2012-05-06 18:18 ` [PATCH 03/18] scsi: fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations) Dan Williams
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2012-05-06 18:18 UTC (permalink / raw)
  To: linux-scsi; +Cc: Tejun Heo, linux-ide, Jeff Garzik, Jacek Danecki

When managing shost->host_eh_scheduled libata assumes that there is a
1:1 shost-to-ata_port relationship.  libsas creates a 1:N relationship
so it needs to manage host_eh_scheduled cumulatively at the host level.
The sched_eh and end_eh port port ops allow libsas to track when domain
devices enter/leave the "eh-pending" state under ha->lock (previously
named ha->state_lock, but it is no longer just a lock for ha->state
changes).

Since host_eh_scheduled indicates eh without backing commands pinning
the device it can be deallocated at any time.  Move the taking of the
domain_device reference under the port_lock to guarantee that the
ata_port stays around for the duration of eh.

Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/ata/libata-core.c           |    4 ++
 drivers/ata/libata-eh.c             |   57 ++++++++++++++++++++++++++++-------
 drivers/scsi/libsas/sas_ata.c       |   38 +++++++++++++++++++++--
 drivers/scsi/libsas/sas_discover.c  |    6 ++--
 drivers/scsi/libsas/sas_event.c     |   12 ++++---
 drivers/scsi/libsas/sas_init.c      |   14 ++++-----
 drivers/scsi/libsas/sas_scsi_host.c |   27 +++++++++++++----
 include/linux/libata.h              |    4 ++
 include/scsi/libsas.h               |    4 ++
 include/scsi/sas_ata.h              |    5 +++
 10 files changed, 134 insertions(+), 37 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 28db50b..bd79c8b 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -80,6 +80,8 @@ const struct ata_port_operations ata_base_port_ops = {
 	.prereset		= ata_std_prereset,
 	.postreset		= ata_std_postreset,
 	.error_handler		= ata_std_error_handler,
+	.sched_eh		= ata_std_sched_eh,
+	.end_eh			= ata_std_end_eh,
 };
 
 const struct ata_port_operations sata_port_ops = {
@@ -6635,6 +6637,8 @@ struct ata_port_operations ata_dummy_port_ops = {
 	.qc_prep		= ata_noop_qc_prep,
 	.qc_issue		= ata_dummy_qc_issue,
 	.error_handler		= ata_dummy_error_handler,
+	.sched_eh		= ata_std_sched_eh,
+	.end_eh			= ata_std_end_eh,
 };
 
 const struct ata_port_info ata_dummy_port_info = {
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index c61316e..4f12f63 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -793,12 +793,12 @@ void ata_scsi_port_error_handler(struct Scsi_Host *host, struct ata_port *ap)
 		ata_for_each_link(link, ap, HOST_FIRST)
 			memset(&link->eh_info, 0, sizeof(link->eh_info));
 
-		/* Clear host_eh_scheduled while holding ap->lock such
-		 * that if exception occurs after this point but
-		 * before EH completion, SCSI midlayer will
+		/* end eh (clear host_eh_scheduled) while holding
+		 * ap->lock such that if exception occurs after this
+		 * point but before EH completion, SCSI midlayer will
 		 * re-initiate EH.
 		 */
-		host->host_eh_scheduled = 0;
+		ap->ops->end_eh(ap);
 
 		spin_unlock_irqrestore(ap->lock, flags);
 		ata_eh_release(ap);
@@ -986,16 +986,13 @@ void ata_qc_schedule_eh(struct ata_queued_cmd *qc)
 }
 
 /**
- *	ata_port_schedule_eh - schedule error handling without a qc
- *	@ap: ATA port to schedule EH for
- *
- *	Schedule error handling for @ap.  EH will kick in as soon as
- *	all commands are drained.
+ * ata_std_sched_eh - non-libsas ata_ports issue eh with this common routine
+ * @ap: ATA port to schedule EH for
  *
- *	LOCKING:
+ *	LOCKING: inherited from ata_port_schedule_eh
  *	spin_lock_irqsave(host lock)
  */
-void ata_port_schedule_eh(struct ata_port *ap)
+void ata_std_sched_eh(struct ata_port *ap)
 {
 	WARN_ON(!ap->ops->error_handler);
 
@@ -1007,6 +1004,44 @@ void ata_port_schedule_eh(struct ata_port *ap)
 
 	DPRINTK("port EH scheduled\n");
 }
+EXPORT_SYMBOL_GPL(ata_std_sched_eh);
+
+/**
+ * ata_std_end_eh - non-libsas ata_ports complete eh with this common routine
+ * @ap: ATA port to end EH for
+ *
+ * In the libata object model there is a 1:1 mapping of ata_port to
+ * shost, so host fields can be directly manipulated under ap->lock, in
+ * the libsas case we need to hold a lock at the ha->level to coordinate
+ * these events.
+ *
+ *	LOCKING:
+ *	spin_lock_irqsave(host lock)
+ */
+void ata_std_end_eh(struct ata_port *ap)
+{
+	struct Scsi_Host *host = ap->scsi_host;
+
+	host->host_eh_scheduled = 0;
+}
+EXPORT_SYMBOL(ata_std_end_eh);
+
+
+/**
+ *	ata_port_schedule_eh - schedule error handling without a qc
+ *	@ap: ATA port to schedule EH for
+ *
+ *	Schedule error handling for @ap.  EH will kick in as soon as
+ *	all commands are drained.
+ *
+ *	LOCKING:
+ *	spin_lock_irqsave(host lock)
+ */
+void ata_port_schedule_eh(struct ata_port *ap)
+{
+	/* see: ata_std_sched_eh, unless you know better */
+	ap->ops->sched_eh(ap);
+}
 
 static int ata_do_link_abort(struct ata_port *ap, struct ata_link *link)
 {
diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 6c31329..6a6c80f 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -523,6 +523,31 @@ static void sas_ata_set_dmamode(struct ata_port *ap, struct ata_device *ata_dev)
 		i->dft->lldd_ata_set_dmamode(dev);
 }
 
+static void sas_ata_sched_eh(struct ata_port *ap)
+{
+	struct domain_device *dev = ap->private_data;
+	struct sas_ha_struct *ha = dev->port->ha;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ha->lock, flags);
+	if (!test_and_set_bit(SAS_DEV_EH_PENDING, &dev->state))
+		ha->eh_active++;
+	ata_std_sched_eh(ap);
+	spin_unlock_irqrestore(&ha->lock, flags);
+}
+
+void sas_ata_end_eh(struct ata_port *ap)
+{
+	struct domain_device *dev = ap->private_data;
+	struct sas_ha_struct *ha = dev->port->ha;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ha->lock, flags);
+	if (test_and_clear_bit(SAS_DEV_EH_PENDING, &dev->state))
+		ha->eh_active--;
+	spin_unlock_irqrestore(&ha->lock, flags);
+}
+
 static struct ata_port_operations sas_sata_ops = {
 	.prereset		= ata_std_prereset,
 	.hardreset		= sas_ata_hard_reset,
@@ -536,6 +561,8 @@ static struct ata_port_operations sas_sata_ops = {
 	.port_start		= ata_sas_port_start,
 	.port_stop		= ata_sas_port_stop,
 	.set_dmamode		= sas_ata_set_dmamode,
+	.sched_eh		= sas_ata_sched_eh,
+	.end_eh			= sas_ata_end_eh,
 };
 
 static struct ata_port_info sata_port_info = {
@@ -707,10 +734,6 @@ static void async_sas_ata_eh(void *data, async_cookie_t cookie)
 	struct ata_port *ap = dev->sata_dev.ap;
 	struct sas_ha_struct *ha = dev->port->ha;
 
-	/* hold a reference over eh since we may be racing with final
-	 * remove once all commands are completed
-	 */
-	kref_get(&dev->kref);
 	sas_ata_printk(KERN_DEBUG, dev, "dev error handler\n");
 	ata_scsi_port_error_handler(ha->core.shost, ap);
 	sas_put_device(dev);
@@ -741,6 +764,13 @@ void sas_ata_strategy_handler(struct Scsi_Host *shost)
 		list_for_each_entry(dev, &port->dev_list, dev_list_node) {
 			if (!dev_is_sata(dev))
 				continue;
+
+			/* hold a reference over eh since we may be
+			 * racing with final remove once all commands
+			 * are completed
+			 */
+			kref_get(&dev->kref);
+
 			async_schedule_domain(async_sas_ata_eh, dev, &async);
 		}
 		spin_unlock(&port->dev_list_lock);
diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index 629a086..ff497ac 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -294,6 +294,8 @@ static void sas_unregister_common_dev(struct asd_sas_port *port, struct domain_d
 
 	spin_lock_irq(&port->dev_list_lock);
 	list_del_init(&dev->dev_list_node);
+	if (dev_is_sata(dev))
+		sas_ata_end_eh(dev->sata_dev.ap);
 	spin_unlock_irq(&port->dev_list_lock);
 
 	sas_put_device(dev);
@@ -488,9 +490,9 @@ static void sas_chain_event(int event, unsigned long *pending,
 	if (!test_and_set_bit(event, pending)) {
 		unsigned long flags;
 
-		spin_lock_irqsave(&ha->state_lock, flags);
+		spin_lock_irqsave(&ha->lock, flags);
 		sas_chain_work(ha, sw);
-		spin_unlock_irqrestore(&ha->state_lock, flags);
+		spin_unlock_irqrestore(&ha->lock, flags);
 	}
 }
 
diff --git a/drivers/scsi/libsas/sas_event.c b/drivers/scsi/libsas/sas_event.c
index 4e4292d..789c4d8 100644
--- a/drivers/scsi/libsas/sas_event.c
+++ b/drivers/scsi/libsas/sas_event.c
@@ -47,9 +47,9 @@ static void sas_queue_event(int event, unsigned long *pending,
 	if (!test_and_set_bit(event, pending)) {
 		unsigned long flags;
 
-		spin_lock_irqsave(&ha->state_lock, flags);
+		spin_lock_irqsave(&ha->lock, flags);
 		sas_queue_work(ha, work);
-		spin_unlock_irqrestore(&ha->state_lock, flags);
+		spin_unlock_irqrestore(&ha->lock, flags);
 	}
 }
 
@@ -61,18 +61,18 @@ void __sas_drain_work(struct sas_ha_struct *ha)
 
 	set_bit(SAS_HA_DRAINING, &ha->state);
 	/* flush submitters */
-	spin_lock_irq(&ha->state_lock);
-	spin_unlock_irq(&ha->state_lock);
+	spin_lock_irq(&ha->lock);
+	spin_unlock_irq(&ha->lock);
 
 	drain_workqueue(wq);
 
-	spin_lock_irq(&ha->state_lock);
+	spin_lock_irq(&ha->lock);
 	clear_bit(SAS_HA_DRAINING, &ha->state);
 	list_for_each_entry_safe(sw, _sw, &ha->defer_q, drain_node) {
 		list_del_init(&sw->drain_node);
 		sas_queue_work(ha, sw);
 	}
-	spin_unlock_irq(&ha->state_lock);
+	spin_unlock_irq(&ha->lock);
 }
 
 int sas_drain_work(struct sas_ha_struct *ha)
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index 10cb5ae..6909fef 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -114,7 +114,7 @@ int sas_register_ha(struct sas_ha_struct *sas_ha)
 		sas_ha->lldd_queue_size = 128; /* Sanity */
 
 	set_bit(SAS_HA_REGISTERED, &sas_ha->state);
-	spin_lock_init(&sas_ha->state_lock);
+	spin_lock_init(&sas_ha->lock);
 	mutex_init(&sas_ha->drain_mutex);
 	INIT_LIST_HEAD(&sas_ha->defer_q);
 
@@ -163,9 +163,9 @@ int sas_unregister_ha(struct sas_ha_struct *sas_ha)
 	 * events to be queued, and flush any in-progress drainers
 	 */
 	mutex_lock(&sas_ha->drain_mutex);
-	spin_lock_irq(&sas_ha->state_lock);
+	spin_lock_irq(&sas_ha->lock);
 	clear_bit(SAS_HA_REGISTERED, &sas_ha->state);
-	spin_unlock_irq(&sas_ha->state_lock);
+	spin_unlock_irq(&sas_ha->lock);
 	__sas_drain_work(sas_ha);
 	mutex_unlock(&sas_ha->drain_mutex);
 
@@ -411,9 +411,9 @@ static int queue_phy_reset(struct sas_phy *phy, int hard_reset)
 	d->reset_result = 0;
 	d->hard_reset = hard_reset;
 
-	spin_lock_irq(&ha->state_lock);
+	spin_lock_irq(&ha->lock);
 	sas_queue_work(ha, &d->reset_work);
-	spin_unlock_irq(&ha->state_lock);
+	spin_unlock_irq(&ha->lock);
 
 	rc = sas_drain_work(ha);
 	if (rc == 0)
@@ -438,9 +438,9 @@ static int queue_phy_enable(struct sas_phy *phy, int enable)
 	d->enable_result = 0;
 	d->enable = enable;
 
-	spin_lock_irq(&ha->state_lock);
+	spin_lock_irq(&ha->lock);
 	sas_queue_work(ha, &d->enable_work);
-	spin_unlock_irq(&ha->state_lock);
+	spin_unlock_irq(&ha->lock);
 
 	rc = sas_drain_work(ha);
 	if (rc == 0)
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 17339e5..52d5b01 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -667,16 +667,20 @@ static void sas_eh_handle_sas_errors(struct Scsi_Host *shost, struct list_head *
 	goto out;
 }
 
+
 void sas_scsi_recover_host(struct Scsi_Host *shost)
 {
 	struct sas_ha_struct *ha = SHOST_TO_SAS_HA(shost);
-	unsigned long flags;
 	LIST_HEAD(eh_work_q);
+	int tries = 0;
+	bool retry;
 
-	spin_lock_irqsave(shost->host_lock, flags);
+retry:
+	tries++;
+	retry = true;
+	spin_lock_irq(shost->host_lock);
 	list_splice_init(&shost->eh_cmd_q, &eh_work_q);
-	shost->host_eh_scheduled = 0;
-	spin_unlock_irqrestore(shost->host_lock, flags);
+	spin_unlock_irq(shost->host_lock);
 
 	SAS_DPRINTK("Enter %s busy: %d failed: %d\n",
 		    __func__, shost->host_busy, shost->host_failed);
@@ -710,8 +714,19 @@ out:
 
 	scsi_eh_flush_done_q(&ha->eh_done_q);
 
-	SAS_DPRINTK("--- Exit %s: busy: %d failed: %d\n",
-		    __func__, shost->host_busy, shost->host_failed);
+	/* check if any new eh work was scheduled during the last run */
+	spin_lock_irq(&ha->lock);
+	if (ha->eh_active == 0) {
+		shost->host_eh_scheduled = 0;
+		retry = false;
+	}
+	spin_unlock_irq(&ha->lock);
+
+	if (retry)
+		goto retry;
+
+	SAS_DPRINTK("--- Exit %s: busy: %d failed: %d tries: %d\n",
+		    __func__, shost->host_busy, shost->host_failed, tries);
 }
 
 enum blk_eh_timer_return sas_scsi_timed_out(struct scsi_cmnd *cmd)
diff --git a/include/linux/libata.h b/include/linux/libata.h
index e926df7..a868bbc 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -845,6 +845,8 @@ struct ata_port_operations {
 	void (*error_handler)(struct ata_port *ap);
 	void (*lost_interrupt)(struct ata_port *ap);
 	void (*post_internal_cmd)(struct ata_queued_cmd *qc);
+	void (*sched_eh)(struct ata_port *ap);
+	void (*end_eh)(struct ata_port *ap);
 
 	/*
 	 * Optional features
@@ -1166,6 +1168,8 @@ extern void ata_do_eh(struct ata_port *ap, ata_prereset_fn_t prereset,
 		      ata_reset_fn_t softreset, ata_reset_fn_t hardreset,
 		      ata_postreset_fn_t postreset);
 extern void ata_std_error_handler(struct ata_port *ap);
+extern void ata_std_sched_eh(struct ata_port *ap);
+extern void ata_std_end_eh(struct ata_port *ap);
 extern int ata_link_nr_enabled(struct ata_link *link);
 
 /*
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index f4f1c96..2718b24 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -177,6 +177,7 @@ struct sata_device {
 enum {
 	SAS_DEV_GONE,
 	SAS_DEV_DESTROY,
+	SAS_DEV_EH_PENDING,
 };
 
 struct domain_device {
@@ -384,7 +385,8 @@ struct sas_ha_struct {
 	struct list_head  defer_q; /* work queued while draining */
 	struct mutex	  drain_mutex;
 	unsigned long	  state;
-	spinlock_t 	  state_lock;
+	spinlock_t	  lock;
+	int		  eh_active;
 
 	struct mutex disco_mutex;
 
diff --git a/include/scsi/sas_ata.h b/include/scsi/sas_ata.h
index 77670e8..2dfbdaa 100644
--- a/include/scsi/sas_ata.h
+++ b/include/scsi/sas_ata.h
@@ -45,6 +45,7 @@ void sas_ata_eh(struct Scsi_Host *shost, struct list_head *work_q,
 void sas_ata_schedule_reset(struct domain_device *dev);
 void sas_ata_wait_eh(struct domain_device *dev);
 void sas_probe_sata(struct asd_sas_port *port);
+void sas_ata_end_eh(struct ata_port *ap);
 #else
 
 
@@ -85,6 +86,10 @@ static inline int sas_get_ata_info(struct domain_device *dev, struct ex_phy *phy
 {
 	return 0;
 }
+
+static inline void sas_ata_end_eh(struct ata_port *ap)
+{
+}
 #endif
 
 #endif /* _SAS_ATA_H_ */


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 03/18] scsi: fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations)
  2012-05-06 18:17 [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
  2012-05-06 18:18 ` [PATCH 01/18] libsas: cleanup spurious calls to scsi_schedule_eh Dan Williams
  2012-05-06 18:18 ` [PATCH 02/18] libata, libsas: introduce sched_eh and end_eh port ops Dan Williams
@ 2012-05-06 18:18 ` Dan Williams
  2012-05-06 18:18 ` [PATCH 04/18] scsi_transport_sas: fix delete vs scan race Dan Williams
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2012-05-06 18:18 UTC (permalink / raw)
  To: linux-scsi; +Cc: Tejun Heo, linux-ide, Tom Jackson

Rapid ata hotplug on a libsas controller results in cases where libsas
is waiting indefinitely on eh to perform an ata probe.

A race exists between scsi_schedule_eh() and scsi_restart_operations()
in the case when scsi_restart_operations() issues i/o to other devices
in the sas domain.  When this happens the host state transitions from
SHOST_RECOVERY (set by scsi_schedule_eh) back to SHOST_RUNNING and
->host_busy is non-zero so we put the eh thread to sleep even though
->host_eh_scheduled is active.

Before putting the error handler to sleep we need to check if the
host_state needs to return to SHOST_RECOVERY for another trip through
eh.

Cc: Tejun Heo <tj@kernel.org>
Reported-by: Tom Jackson <thomas.p.jackson@intel.com>
Tested-by: Tom Jackson <thomas.p.jackson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/scsi_error.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 386f0c5..cc8dc8c 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1687,6 +1687,20 @@ static void scsi_restart_operations(struct Scsi_Host *shost)
 	 * requests are started.
 	 */
 	scsi_run_host_queues(shost);
+
+	/*
+	 * if eh is active and host_eh_scheduled is pending we need to re-run
+	 * recovery.  we do this check after scsi_run_host_queues() to allow
+	 * everything pent up since the last eh run a chance to make forward
+	 * progress before we sync again.  Either we'll immediately re-run
+	 * recovery or scsi_device_unbusy() will wake us again when these
+	 * pending commands complete.
+	 */
+	spin_lock_irqsave(shost->host_lock, flags);
+	if (shost->host_eh_scheduled)
+		if (scsi_host_set_state(shost, SHOST_RECOVERY))
+			WARN_ON(scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY));
+	spin_unlock_irqrestore(shost->host_lock, flags);
 }
 
 /**


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 04/18] scsi_transport_sas: fix delete vs scan race
  2012-05-06 18:17 [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
                   ` (2 preceding siblings ...)
  2012-05-06 18:18 ` [PATCH 03/18] scsi: fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations) Dan Williams
@ 2012-05-06 18:18 ` Dan Williams
  2012-05-06 18:18 ` [PATCH 05/18] libsas: enforce eh strategy handlers only in eh context Dan Williams
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2012-05-06 18:18 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

The following crash results from cases where the end_device has been
removed before scsi_sysfs_add_sdev has had a chance to run.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
 IP: [<ffffffff8115e100>] sysfs_create_dir+0x32/0xb6
 ...
 Call Trace:
  [<ffffffff8125e4a8>] kobject_add_internal+0x120/0x1e3
  [<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf
  [<ffffffff8125e641>] kobject_add_varg+0x41/0x50
  [<ffffffff8125e70b>] kobject_add+0x64/0x66
  [<ffffffff8131122b>] device_add+0x12d/0x63a
  [<ffffffff814b65ea>] ? _raw_spin_unlock_irqrestore+0x47/0x56
  [<ffffffff8107de15>] ? module_refcount+0x89/0xa0
  [<ffffffff8132f348>] scsi_sysfs_add_sdev+0x4e/0x28a
  [<ffffffff8132dcbb>] do_scan_async+0x9c/0x145

...teach sas_rphy_remove to wait for async scanning to quiesce before
removing the end_device.  It seems this is a more general problem [1],
but this patch only addresses sas transport.

[1]: 23edb6e [SCSI] mpt2sas: Do not set sas_device->starget to NULL from
the slave_destroy callback when all the LUNS have been deleted

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/scsi_transport_sas.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c
index f7565fc..47abb90 100644
--- a/drivers/scsi/scsi_transport_sas.c
+++ b/drivers/scsi/scsi_transport_sas.c
@@ -33,8 +33,9 @@
 #include <linux/bsg.h>
 
 #include <scsi/scsi.h>
-#include <scsi/scsi_device.h>
 #include <scsi/scsi_host.h>
+#include <scsi/scsi_scan.h>
+#include <scsi/scsi_device.h>
 #include <scsi/scsi_transport.h>
 #include <scsi/scsi_transport_sas.h>
 
@@ -1667,6 +1668,9 @@ sas_rphy_remove(struct sas_rphy *rphy)
 {
 	struct device *dev = &rphy->dev;
 
+	/* prevent device_del() while child device_add() may be in-flight */
+	scsi_complete_async_scans();
+
 	switch (rphy->identify.device_type) {
 	case SAS_END_DEVICE:
 		scsi_remove_target(dev);


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 05/18] libsas: enforce eh strategy handlers only in eh context
  2012-05-06 18:17 [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
                   ` (3 preceding siblings ...)
  2012-05-06 18:18 ` [PATCH 04/18] scsi_transport_sas: fix delete vs scan race Dan Williams
@ 2012-05-06 18:18 ` Dan Williams
  2012-05-06 18:18 ` [PATCH 06/18] libsas: add sas_eh_abort_handler Dan Williams
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2012-05-06 18:18 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

The strategy handlers may be called in places that are problematic for
libsas (i.e. sata resets outside of domain revalidation filtering /
libata link recovery), or problematic for userspace (non-blocking ioctl
to sleeping reset functions).  However, these routines are also called
for eh escalations and recovery of scsi_eh_prep_cmnd(), so permit them
as long as we are running in the host's error handler, otherwise arrange
for them to be triggered in eh_context.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_discover.c  |   11 +++
 drivers/scsi/libsas/sas_init.c      |    2 +
 drivers/scsi/libsas/sas_scsi_host.c |  121 ++++++++++++++++++++++++++++++++++-
 include/scsi/libsas.h               |   10 +++
 4 files changed, 140 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index ff497ac..b031d23 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -39,6 +39,7 @@ void sas_init_dev(struct domain_device *dev)
 {
 	switch (dev->dev_type) {
 	case SAS_END_DEV:
+		INIT_LIST_HEAD(&dev->ssp_dev.eh_list_node);
 		break;
 	case EDGE_DEV:
 	case FANOUT_DEV:
@@ -286,6 +287,8 @@ void sas_free_device(struct kref *kref)
 
 static void sas_unregister_common_dev(struct asd_sas_port *port, struct domain_device *dev)
 {
+	struct sas_ha_struct *ha = port->ha;
+
 	sas_notify_lldd_dev_gone(dev);
 	if (!dev->parent)
 		dev->port->port_dev = NULL;
@@ -298,6 +301,14 @@ static void sas_unregister_common_dev(struct asd_sas_port *port, struct domain_d
 		sas_ata_end_eh(dev->sata_dev.ap);
 	spin_unlock_irq(&port->dev_list_lock);
 
+	spin_lock_irq(&ha->lock);
+	if (dev->dev_type == SAS_END_DEV &&
+	    !list_empty(&dev->ssp_dev.eh_list_node)) {
+		list_del_init(&dev->ssp_dev.eh_list_node);
+		ha->eh_active--;
+	}
+	spin_unlock_irq(&ha->lock);
+
 	sas_put_device(dev);
 }
 
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index 6909fef..1bbab3d 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -116,7 +116,9 @@ int sas_register_ha(struct sas_ha_struct *sas_ha)
 	set_bit(SAS_HA_REGISTERED, &sas_ha->state);
 	spin_lock_init(&sas_ha->lock);
 	mutex_init(&sas_ha->drain_mutex);
+	init_waitqueue_head(&sas_ha->eh_wait_q);
 	INIT_LIST_HEAD(&sas_ha->defer_q);
+	INIT_LIST_HEAD(&sas_ha->eh_dev_q);
 
 	error = sas_register_phys(sas_ha);
 	if (error) {
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 52d5b01..2e0e779 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -460,14 +460,88 @@ struct sas_phy *sas_get_local_phy(struct domain_device *dev)
 }
 EXPORT_SYMBOL_GPL(sas_get_local_phy);
 
+static void sas_wait_eh(struct domain_device *dev)
+{
+	struct sas_ha_struct *ha = dev->port->ha;
+	DEFINE_WAIT(wait);
+
+	if (dev_is_sata(dev)) {
+		ata_port_wait_eh(dev->sata_dev.ap);
+		return;
+	}
+ retry:
+	spin_lock_irq(&ha->lock);
+
+	while (test_bit(SAS_DEV_EH_PENDING, &dev->state)) {
+		prepare_to_wait(&ha->eh_wait_q, &wait, TASK_UNINTERRUPTIBLE);
+		spin_unlock_irq(&ha->lock);
+		schedule();
+		spin_lock_irq(&ha->lock);
+	}
+	finish_wait(&ha->eh_wait_q, &wait);
+
+	spin_unlock_irq(&ha->lock);
+
+	/* make sure SCSI EH is complete */
+	if (scsi_host_in_recovery(ha->core.shost)) {
+		msleep(10);
+		goto retry;
+	}
+}
+EXPORT_SYMBOL(sas_wait_eh);
+
+static int sas_queue_reset(struct domain_device *dev, int reset_type, int lun, int wait)
+{
+	struct sas_ha_struct *ha = dev->port->ha;
+	int scheduled = 0, tries = 100;
+
+	/* ata: promote lun reset to bus reset */
+	if (dev_is_sata(dev)) {
+		sas_ata_schedule_reset(dev);
+		if (wait)
+			sas_ata_wait_eh(dev);
+		return SUCCESS;
+	}
+
+	while (!scheduled && tries--) {
+		spin_lock_irq(&ha->lock);
+		if (!test_bit(SAS_DEV_EH_PENDING, &dev->state) &&
+		    !test_bit(reset_type, &dev->state)) {
+			scheduled = 1;
+			ha->eh_active++;
+			list_add_tail(&dev->ssp_dev.eh_list_node, &ha->eh_dev_q);
+			set_bit(SAS_DEV_EH_PENDING, &dev->state);
+			set_bit(reset_type, &dev->state);
+			int_to_scsilun(lun, &dev->ssp_dev.reset_lun);
+			scsi_schedule_eh(ha->core.shost);
+		}
+		spin_unlock_irq(&ha->lock);
+
+		if (wait)
+			sas_wait_eh(dev);
+
+		if (scheduled)
+			return SUCCESS;
+	}
+
+	SAS_DPRINTK("%s reset of %s failed\n",
+		    reset_type == SAS_DEV_LU_RESET ? "LUN" : "Bus",
+		    dev_name(&dev->rphy->dev));
+
+	return FAILED;
+}
+
 /* Attempt to send a LUN reset message to a device */
 int sas_eh_device_reset_handler(struct scsi_cmnd *cmd)
 {
-	struct domain_device *dev = cmd_to_domain_dev(cmd);
-	struct sas_internal *i =
-		to_sas_internal(dev->port->ha->core.shost->transportt);
-	struct scsi_lun lun;
 	int res;
+	struct scsi_lun lun;
+	struct Scsi_Host *host = cmd->device->host;
+	struct domain_device *dev = cmd_to_domain_dev(cmd);
+	struct sas_internal *i = to_sas_internal(host->transportt);
+
+	if (current != host->ehandler)
+		return sas_queue_reset(dev, SAS_DEV_LU_RESET, cmd->device->lun, 0);
 
 	int_to_scsilun(cmd->device->lun, &lun);
 
@@ -486,8 +560,12 @@ int sas_eh_bus_reset_handler(struct scsi_cmnd *cmd)
 {
 	struct domain_device *dev = cmd_to_domain_dev(cmd);
 	struct sas_phy *phy = sas_get_local_phy(dev);
+	struct Scsi_Host *host = cmd->device->host;
 	int res;
 
+	if (current != host->ehandler)
+		return sas_queue_reset(dev, SAS_DEV_RESET, 0, 0);
+
 	res = sas_phy_reset(phy, 1);
 	if (res)
 		SAS_DPRINTK("Bus reset of %s failed 0x%x\n",
@@ -667,6 +745,39 @@ static void sas_eh_handle_sas_errors(struct Scsi_Host *shost, struct list_head *
 	goto out;
 }
 
+static void sas_eh_handle_resets(struct Scsi_Host *shost)
+{
+	struct sas_ha_struct *ha = SHOST_TO_SAS_HA(shost);
+	struct sas_internal *i = to_sas_internal(shost->transportt);
+
+	/* handle directed resets to sas devices */
+	spin_lock_irq(&ha->lock);
+	while (!list_empty(&ha->eh_dev_q)) {
+		struct domain_device *dev;
+		struct ssp_device *ssp;
+
+		ssp = list_entry(ha->eh_dev_q.next, typeof(*ssp), eh_list_node);
+		list_del_init(&ssp->eh_list_node);
+		dev = container_of(ssp, typeof(*dev), ssp_dev);
+		kref_get(&dev->kref);
+		WARN_ONCE(dev_is_sata(dev), "ssp reset to ata device?\n");
+
+		spin_unlock_irq(&ha->lock);
+
+		if (test_and_clear_bit(SAS_DEV_LU_RESET, &dev->state))
+			i->dft->lldd_lu_reset(dev, ssp->reset_lun.scsi_lun);
+
+		if (test_and_clear_bit(SAS_DEV_RESET, &dev->state))
+			i->dft->lldd_I_T_nexus_reset(dev);
+
+		sas_put_device(dev);
+		spin_lock_irq(&ha->lock);
+		clear_bit(SAS_DEV_EH_PENDING, &dev->state);
+		ha->eh_active--;
+	}
+	spin_unlock_irq(&ha->lock);
+}
+
 
 void sas_scsi_recover_host(struct Scsi_Host *shost)
 {
@@ -709,6 +820,8 @@ out:
 	if (ha->lldd_max_execute_num > 1)
 		wake_up_process(ha->core.queue_thread);
 
+	sas_eh_handle_resets(shost);
+
 	/* now link into libata eh --- if we have any ata devices */
 	sas_ata_strategy_handler(shost);
 
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 2718b24..ade862a 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -174,10 +174,17 @@ struct sata_device {
 	struct ata_taskfile tf;
 };
 
+struct ssp_device {
+	struct list_head eh_list_node; /* pending a user requested eh action */
+	struct scsi_lun reset_lun;
+};
+
 enum {
 	SAS_DEV_GONE,
 	SAS_DEV_DESTROY,
 	SAS_DEV_EH_PENDING,
+	SAS_DEV_LU_RESET,
+	SAS_DEV_RESET,
 };
 
 struct domain_device {
@@ -211,6 +218,7 @@ struct domain_device {
         union {
                 struct expander_device ex_dev;
                 struct sata_device     sata_dev; /* STP & directly attached */
+		struct ssp_device      ssp_dev;
         };
 
         void *lldd_dev;
@@ -387,6 +395,8 @@ struct sas_ha_struct {
 	unsigned long	  state;
 	spinlock_t	  lock;
 	int		  eh_active;
+	wait_queue_head_t eh_wait_q;
+	struct list_head  eh_dev_q;
 
 	struct mutex disco_mutex;
 


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 06/18] libsas: add sas_eh_abort_handler
  2012-05-06 18:17 [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
                   ` (4 preceding siblings ...)
  2012-05-06 18:18 ` [PATCH 05/18] libsas: enforce eh strategy handlers only in eh context Dan Williams
@ 2012-05-06 18:18 ` Dan Williams
  2012-05-06 18:18 ` [PATCH 07/18] libsas: use ->lldd_I_T_nexus_reset for ->eh_bus_reset_handler Dan Williams
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2012-05-06 18:18 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Jacek Danecki

When recovering failed eh-cmnds let the lldd attempt an abort via
scsi_abort_eh_cmnd before escalating.

Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_scsi_host.c |   21 +++++++++++++++++++++
 include/scsi/libsas.h               |    1 +
 2 files changed, 22 insertions(+)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 2e0e779..875b871 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -531,6 +531,27 @@ static int sas_queue_reset(struct domain_device *dev, int reset_type, int lun, i
 	return FAILED;
 }
 
+int sas_eh_abort_handler(struct scsi_cmnd *cmd)
+{
+	int res;
+	struct sas_task *task = TO_SAS_TASK(cmd);
+	struct Scsi_Host *host = cmd->device->host;
+	struct sas_internal *i = to_sas_internal(host->transportt);
+
+	if (current != host->ehandler)
+		return FAILED;
+
+	if (!i->dft->lldd_abort_task)
+		return FAILED;
+
+	res = i->dft->lldd_abort_task(task);
+	if (res == TMF_RESP_FUNC_SUCC || res == TMF_RESP_FUNC_COMPLETE)
+		return SUCCESS;
+
+	return FAILED;
+}
+EXPORT_SYMBOL_GPL(sas_eh_abort_handler);
+
 /* Attempt to send a LUN reset message to a device */
 int sas_eh_device_reset_handler(struct scsi_cmnd *cmd)
 {
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index ade862a..7c4bd08 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -718,6 +718,7 @@ void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *);
 void sas_init_dev(struct domain_device *);
 
 void sas_task_abort(struct sas_task *);
+int sas_eh_abort_handler(struct scsi_cmnd *cmd);
 int sas_eh_device_reset_handler(struct scsi_cmnd *cmd);
 int sas_eh_bus_reset_handler(struct scsi_cmnd *cmd);
 


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 07/18] libsas: use ->lldd_I_T_nexus_reset for ->eh_bus_reset_handler
  2012-05-06 18:17 [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
                   ` (5 preceding siblings ...)
  2012-05-06 18:18 ` [PATCH 06/18] libsas: add sas_eh_abort_handler Dan Williams
@ 2012-05-06 18:18 ` Dan Williams
  2012-05-06 18:18 ` [PATCH 08/18] isci: use sas eh strategy handlers Dan Williams
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2012-05-06 18:18 UTC (permalink / raw)
  To: linux-scsi
  Cc: linux-ide, Jacek Danecki, Jack Wang, Luben Tuikov, Xiangliang Yu

sas_eh_bus_reset_handler() amounts to sas_phy_reset() without
notification of the reset to the lldd.  If this is triggered from
eh-cmnd recovery there may be sas_tasks for the lldd to terminate, so
->lldd_I_T_nexus_reset is warranted.

Cc: Xiangliang Yu <yuxiangl@marvell.com>
Cc: Luben Tuikov <ltuikov@yahoo.com>
Cc: Jack Wang <jack_wang@usish.com>
Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
[jacek: modify pm8001_I_T_nexus_reset to return -ENODEV]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_scsi_host.c |   19 ++++++++-----------
 drivers/scsi/pm8001/pm8001_sas.c    |    3 ++-
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 875b871..6764148 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -576,25 +576,22 @@ int sas_eh_device_reset_handler(struct scsi_cmnd *cmd)
 	return FAILED;
 }
 
-/* Attempt to send a phy (bus) reset */
 int sas_eh_bus_reset_handler(struct scsi_cmnd *cmd)
 {
-	struct domain_device *dev = cmd_to_domain_dev(cmd);
-	struct sas_phy *phy = sas_get_local_phy(dev);
-	struct Scsi_Host *host = cmd->device->host;
 	int res;
+	struct Scsi_Host *host = cmd->device->host;
+	struct domain_device *dev = cmd_to_domain_dev(cmd);
+	struct sas_internal *i = to_sas_internal(host->transportt);
 
 	if (current != host->ehandler)
 		return sas_queue_reset(dev, SAS_DEV_RESET, 0, 0);
 
-	res = sas_phy_reset(phy, 1);
-	if (res)
-		SAS_DPRINTK("Bus reset of %s failed 0x%x\n",
-			    kobject_name(&phy->dev.kobj),
-			    res);
-	sas_put_local_phy(phy);
+	if (!i->dft->lldd_I_T_nexus_reset)
+		return FAILED;
 
-	if (res == TMF_RESP_FUNC_SUCC || res == TMF_RESP_FUNC_COMPLETE)
+	res = i->dft->lldd_I_T_nexus_reset(dev);
+	if (res == TMF_RESP_FUNC_SUCC || res == TMF_RESP_FUNC_COMPLETE ||
+	    res == -ENODEV)
 		return SUCCESS;
 
 	return FAILED;
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index 3b11edd..fdbba57 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -962,8 +962,9 @@ int pm8001_I_T_nexus_reset(struct domain_device *dev)
 	struct pm8001_device *pm8001_dev;
 	struct pm8001_hba_info *pm8001_ha;
 	struct sas_phy *phy;
+
 	if (!dev || !dev->lldd_dev)
-		return -1;
+		return -ENODEV;
 
 	pm8001_dev = dev->lldd_dev;
 	pm8001_ha = pm8001_find_ha_by_dev(dev);


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 08/18] isci: use sas eh strategy handlers
  2012-05-06 18:17 [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
                   ` (6 preceding siblings ...)
  2012-05-06 18:18 ` [PATCH 07/18] libsas: use ->lldd_I_T_nexus_reset for ->eh_bus_reset_handler Dan Williams
@ 2012-05-06 18:18 ` Dan Williams
  2012-05-06 18:18 ` [PATCH 09/18] libsas: trim sas_task of slow path infrastructure Dan Williams
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2012-05-06 18:18 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

...now that the strategy handlers guarantee eh context and notify
the driver of bus reset.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/isci/init.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/isci/init.c b/drivers/scsi/isci/init.c
index 5137db5..a98280d 100644
--- a/drivers/scsi/isci/init.c
+++ b/drivers/scsi/isci/init.c
@@ -166,6 +166,9 @@ static struct scsi_host_template isci_sht = {
 	.sg_tablesize			= SG_ALL,
 	.max_sectors			= SCSI_DEFAULT_MAX_SECTORS,
 	.use_clustering			= ENABLE_CLUSTERING,
+	.eh_abort_handler		= sas_eh_abort_handler,
+	.eh_device_reset_handler        = sas_eh_device_reset_handler,
+	.eh_bus_reset_handler           = sas_eh_bus_reset_handler,
 	.target_destroy			= sas_target_destroy,
 	.ioctl				= sas_ioctl,
 	.shost_attrs			= isci_host_attrs,


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 09/18] libsas: trim sas_task of slow path infrastructure
  2012-05-06 18:17 [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
                   ` (7 preceding siblings ...)
  2012-05-06 18:18 ` [PATCH 08/18] isci: use sas eh strategy handlers Dan Williams
@ 2012-05-06 18:18 ` Dan Williams
  2012-05-06 18:18 ` [PATCH 10/18] libsas: sas_rediscover_dev did not look at the SMP exec status Dan Williams
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2012-05-06 18:18 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Christoph Hellwig, Jack Wang

The timer and the completion are only used for slow path tasks (smp, and
lldd tmfs), yet we incur the allocation space and cpu setup time for
every fast path task.

Cc: Christoph Hellwig <hch@lst.de>
Acked-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_expander.c  |   20 ++++++++++----------
 drivers/scsi/libsas/sas_init.c      |   23 +++++++++++++++++++++--
 drivers/scsi/libsas/sas_scsi_host.c |    8 ++++++--
 drivers/scsi/mvsas/mv_sas.c         |   20 ++++++++++----------
 drivers/scsi/pm8001/pm8001_sas.c    |   34 +++++++++++++++++-----------------
 include/scsi/libsas.h               |   14 +++++++++-----
 6 files changed, 73 insertions(+), 46 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index caa0525..f1e8e0a 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -51,14 +51,14 @@ static void smp_task_timedout(unsigned long _task)
 		task->task_state_flags |= SAS_TASK_STATE_ABORTED;
 	spin_unlock_irqrestore(&task->task_state_lock, flags);
 
-	complete(&task->completion);
+	complete(&task->slow_task->completion);
 }
 
 static void smp_task_done(struct sas_task *task)
 {
-	if (!del_timer(&task->timer))
+	if (!del_timer(&task->slow_task->timer))
 		return;
-	complete(&task->completion);
+	complete(&task->slow_task->completion);
 }
 
 /* Give it some long enough timeout. In seconds. */
@@ -79,7 +79,7 @@ static int smp_execute_task(struct domain_device *dev, void *req, int req_size,
 			break;
 		}
 
-		task = sas_alloc_task(GFP_KERNEL);
+		task = sas_alloc_slow_task(GFP_KERNEL);
 		if (!task) {
 			res = -ENOMEM;
 			break;
@@ -91,20 +91,20 @@ static int smp_execute_task(struct domain_device *dev, void *req, int req_size,
 
 		task->task_done = smp_task_done;
 
-		task->timer.data = (unsigned long) task;
-		task->timer.function = smp_task_timedout;
-		task->timer.expires = jiffies + SMP_TIMEOUT*HZ;
-		add_timer(&task->timer);
+		task->slow_task->timer.data = (unsigned long) task;
+		task->slow_task->timer.function = smp_task_timedout;
+		task->slow_task->timer.expires = jiffies + SMP_TIMEOUT*HZ;
+		add_timer(&task->slow_task->timer);
 
 		res = i->dft->lldd_execute_task(task, 1, GFP_KERNEL);
 
 		if (res) {
-			del_timer(&task->timer);
+			del_timer(&task->slow_task->timer);
 			SAS_DPRINTK("executing SMP task failed:%d\n", res);
 			break;
 		}
 
-		wait_for_completion(&task->completion);
+		wait_for_completion(&task->slow_task->completion);
 		res = -ECOMM;
 		if ((task->task_state_flags & SAS_TASK_STATE_ABORTED)) {
 			SAS_DPRINTK("smp task timed out or aborted\n");
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index 1bbab3d..014297c 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -48,18 +48,37 @@ struct sas_task *sas_alloc_task(gfp_t flags)
 		INIT_LIST_HEAD(&task->list);
 		spin_lock_init(&task->task_state_lock);
 		task->task_state_flags = SAS_TASK_STATE_PENDING;
-		init_timer(&task->timer);
-		init_completion(&task->completion);
 	}
 
 	return task;
 }
 EXPORT_SYMBOL_GPL(sas_alloc_task);
 
+struct sas_task *sas_alloc_slow_task(gfp_t flags)
+{
+	struct sas_task *task = sas_alloc_task(flags);
+	struct sas_task_slow *slow = kmalloc(sizeof(*slow), flags);
+
+	if (!task || !slow) {
+		if (task)
+			kmem_cache_free(sas_task_cache, task);
+		kfree(slow);
+		return NULL;
+	}
+
+	task->slow_task = slow;
+	init_timer(&slow->timer);
+	init_completion(&slow->completion);
+
+	return task;
+}
+EXPORT_SYMBOL_GPL(sas_alloc_slow_task);
+
 void sas_free_task(struct sas_task *task)
 {
 	if (task) {
 		BUG_ON(!list_empty(&task->list));
+		kfree(task->slow_task);
 		kmem_cache_free(sas_task_cache, task);
 	}
 }
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 6764148..6e795a1 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -1134,9 +1134,13 @@ void sas_task_abort(struct sas_task *task)
 
 	/* Escape for libsas internal commands */
 	if (!sc) {
-		if (!del_timer(&task->timer))
+		struct sas_task_slow *slow = task->slow_task;
+
+		if (!slow)
+			return;
+		if (!del_timer(&slow->timer))
 			return;
-		task->timer.function(task->timer.data);
+		slow->timer.function(slow->timer.data);
 		return;
 	}
 
diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
index fd3b283..7d46c1a 100644
--- a/drivers/scsi/mvsas/mv_sas.c
+++ b/drivers/scsi/mvsas/mv_sas.c
@@ -1365,9 +1365,9 @@ void mvs_dev_gone(struct domain_device *dev)
 
 static void mvs_task_done(struct sas_task *task)
 {
-	if (!del_timer(&task->timer))
+	if (!del_timer(&task->slow_task->timer))
 		return;
-	complete(&task->completion);
+	complete(&task->slow_task->completion);
 }
 
 static void mvs_tmf_timedout(unsigned long data)
@@ -1375,7 +1375,7 @@ static void mvs_tmf_timedout(unsigned long data)
 	struct sas_task *task = (struct sas_task *)data;
 
 	task->task_state_flags |= SAS_TASK_STATE_ABORTED;
-	complete(&task->completion);
+	complete(&task->slow_task->completion);
 }
 
 #define MVS_TASK_TIMEOUT 20
@@ -1386,7 +1386,7 @@ static int mvs_exec_internal_tmf_task(struct domain_device *dev,
 	struct sas_task *task = NULL;
 
 	for (retry = 0; retry < 3; retry++) {
-		task = sas_alloc_task(GFP_KERNEL);
+		task = sas_alloc_slow_task(GFP_KERNEL);
 		if (!task)
 			return -ENOMEM;
 
@@ -1396,20 +1396,20 @@ static int mvs_exec_internal_tmf_task(struct domain_device *dev,
 		memcpy(&task->ssp_task, parameter, para_len);
 		task->task_done = mvs_task_done;
 
-		task->timer.data = (unsigned long) task;
-		task->timer.function = mvs_tmf_timedout;
-		task->timer.expires = jiffies + MVS_TASK_TIMEOUT*HZ;
-		add_timer(&task->timer);
+		task->slow_task->timer.data = (unsigned long) task;
+		task->slow_task->timer.function = mvs_tmf_timedout;
+		task->slow_task->timer.expires = jiffies + MVS_TASK_TIMEOUT*HZ;
+		add_timer(&task->slow_task->timer);
 
 		res = mvs_task_exec(task, 1, GFP_KERNEL, NULL, 1, tmf);
 
 		if (res) {
-			del_timer(&task->timer);
+			del_timer(&task->slow_task->timer);
 			mv_printk("executing internel task failed:%d\n", res);
 			goto ex_err;
 		}
 
-		wait_for_completion(&task->completion);
+		wait_for_completion(&task->slow_task->completion);
 		res = TMF_RESP_FUNC_FAILED;
 		/* Even TMF timed out, return direct. */
 		if ((task->task_state_flags & SAS_TASK_STATE_ABORTED)) {
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index fdbba57..b961112 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -650,9 +650,9 @@ int pm8001_dev_found(struct domain_device *dev)
 
 static void pm8001_task_done(struct sas_task *task)
 {
-	if (!del_timer(&task->timer))
+	if (!del_timer(&task->slow_task->timer))
 		return;
-	complete(&task->completion);
+	complete(&task->slow_task->completion);
 }
 
 static void pm8001_tmf_timedout(unsigned long data)
@@ -660,7 +660,7 @@ static void pm8001_tmf_timedout(unsigned long data)
 	struct sas_task *task = (struct sas_task *)data;
 
 	task->task_state_flags |= SAS_TASK_STATE_ABORTED;
-	complete(&task->completion);
+	complete(&task->slow_task->completion);
 }
 
 #define PM8001_TASK_TIMEOUT 20
@@ -683,7 +683,7 @@ static int pm8001_exec_internal_tmf_task(struct domain_device *dev,
 	struct pm8001_hba_info *pm8001_ha = pm8001_find_ha_by_dev(dev);
 
 	for (retry = 0; retry < 3; retry++) {
-		task = sas_alloc_task(GFP_KERNEL);
+		task = sas_alloc_slow_task(GFP_KERNEL);
 		if (!task)
 			return -ENOMEM;
 
@@ -691,21 +691,21 @@ static int pm8001_exec_internal_tmf_task(struct domain_device *dev,
 		task->task_proto = dev->tproto;
 		memcpy(&task->ssp_task, parameter, para_len);
 		task->task_done = pm8001_task_done;
-		task->timer.data = (unsigned long)task;
-		task->timer.function = pm8001_tmf_timedout;
-		task->timer.expires = jiffies + PM8001_TASK_TIMEOUT*HZ;
-		add_timer(&task->timer);
+		task->slow_task->timer.data = (unsigned long)task;
+		task->slow_task->timer.function = pm8001_tmf_timedout;
+		task->slow_task->timer.expires = jiffies + PM8001_TASK_TIMEOUT*HZ;
+		add_timer(&task->slow_task->timer);
 
 		res = pm8001_task_exec(task, 1, GFP_KERNEL, 1, tmf);
 
 		if (res) {
-			del_timer(&task->timer);
+			del_timer(&task->slow_task->timer);
 			PM8001_FAIL_DBG(pm8001_ha,
 				pm8001_printk("Executing internal task "
 				"failed\n"));
 			goto ex_err;
 		}
-		wait_for_completion(&task->completion);
+		wait_for_completion(&task->slow_task->completion);
 		res = -TMF_RESP_FUNC_FAILED;
 		/* Even TMF timed out, return direct. */
 		if ((task->task_state_flags & SAS_TASK_STATE_ABORTED)) {
@@ -765,17 +765,17 @@ pm8001_exec_internal_task_abort(struct pm8001_hba_info *pm8001_ha,
 	struct sas_task *task = NULL;
 
 	for (retry = 0; retry < 3; retry++) {
-		task = sas_alloc_task(GFP_KERNEL);
+		task = sas_alloc_slow_task(GFP_KERNEL);
 		if (!task)
 			return -ENOMEM;
 
 		task->dev = dev;
 		task->task_proto = dev->tproto;
 		task->task_done = pm8001_task_done;
-		task->timer.data = (unsigned long)task;
-		task->timer.function = pm8001_tmf_timedout;
-		task->timer.expires = jiffies + PM8001_TASK_TIMEOUT * HZ;
-		add_timer(&task->timer);
+		task->slow_task->timer.data = (unsigned long)task;
+		task->slow_task->timer.function = pm8001_tmf_timedout;
+		task->slow_task->timer.expires = jiffies + PM8001_TASK_TIMEOUT * HZ;
+		add_timer(&task->slow_task->timer);
 
 		res = pm8001_tag_alloc(pm8001_ha, &ccb_tag);
 		if (res)
@@ -789,13 +789,13 @@ pm8001_exec_internal_task_abort(struct pm8001_hba_info *pm8001_ha,
 			pm8001_dev, flag, task_tag, ccb_tag);
 
 		if (res) {
-			del_timer(&task->timer);
+			del_timer(&task->slow_task->timer);
 			PM8001_FAIL_DBG(pm8001_ha,
 				pm8001_printk("Executing internal task "
 				"failed\n"));
 			goto ex_err;
 		}
-		wait_for_completion(&task->completion);
+		wait_for_completion(&task->slow_task->completion);
 		res = TMF_RESP_FUNC_FAILED;
 		/* Even TMF timed out, return direct. */
 		if ((task->task_state_flags & SAS_TASK_STATE_ABORTED)) {
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 7c4bd08..996ee7c 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -612,10 +612,6 @@ struct sas_task {
 
 	enum   sas_protocol      task_proto;
 
-	/* Used by the discovery code. */
-	struct timer_list     timer;
-	struct completion     completion;
-
 	union {
 		struct sas_ata_task ata_task;
 		struct sas_smp_task smp_task;
@@ -632,8 +628,15 @@ struct sas_task {
 
 	void   *lldd_task;	  /* for use by LLDDs */
 	void   *uldd_task;
+	struct sas_task_slow *slow_task;
+};
 
-	struct work_struct abort_work;
+struct sas_task_slow {
+	/* standard/extra infrastructure for slow path commands (SMP and
+	 * internal lldd commands
+	 */
+	struct timer_list     timer;
+	struct completion     completion;
 };
 
 #define SAS_TASK_STATE_PENDING      1
@@ -643,6 +646,7 @@ struct sas_task {
 #define SAS_TASK_AT_INITIATOR       16
 
 extern struct sas_task *sas_alloc_task(gfp_t flags);
+extern struct sas_task *sas_alloc_slow_task(gfp_t flags);
 extern void sas_free_task(struct sas_task *task);
 
 struct sas_domain_function_template {


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 10/18] libsas: sas_rediscover_dev did not look at the SMP exec status.
  2012-05-06 18:17 [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
                   ` (8 preceding siblings ...)
  2012-05-06 18:18 ` [PATCH 09/18] libsas: trim sas_task of slow path infrastructure Dan Williams
@ 2012-05-06 18:18 ` Dan Williams
  2012-05-06 18:18 ` [PATCH 11/18] mvsas: remove unused variable in mvs_task_exec() Dan Williams
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2012-05-06 18:18 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Jeff Skirvin

From: Jeff Skirvin <jeffrey.d.skirvin@intel.com>

The discovery function "sas_rediscover_dev" had two bugs: 1) it did
not pay attention to the return status from the SMP task execution;
2) the stack variable used for the returned SAS address was compared
against 0 without being initialized.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
[djbw: todo sanitize smp_execute_task return values (see: residue)]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_expander.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index f1e8e0a..63d3e59 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -2005,6 +2005,7 @@ static int sas_rediscover_dev(struct domain_device *dev, int phy_id, bool last)
 	u8 sas_addr[8];
 	int res;
 
+	memset(sas_addr, 0, 8);
 	res = sas_get_phy_attached_dev(dev, phy_id, sas_addr, &type);
 	switch (res) {
 	case SMP_RESP_NO_PHY:
@@ -2017,9 +2018,13 @@ static int sas_rediscover_dev(struct domain_device *dev, int phy_id, bool last)
 		return res;
 	case SMP_RESP_FUNC_ACC:
 		break;
+	case -ECOMM:
+		break;
+	default:
+		return res;
 	}
 
-	if (SAS_ADDR(sas_addr) == 0) {
+	if ((SAS_ADDR(sas_addr) == 0) || (res == -ECOMM)) {
 		phy->phy_state = PHY_EMPTY;
 		sas_unregister_devs_sas_addr(dev, phy_id, last);
 		return res;


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 11/18] mvsas: remove unused variable in mvs_task_exec()
  2012-05-06 18:17 [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
                   ` (9 preceding siblings ...)
  2012-05-06 18:18 ` [PATCH 10/18] libsas: sas_rediscover_dev did not look at the SMP exec status Dan Williams
@ 2012-05-06 18:18 ` Dan Williams
  2012-05-06 18:18 ` [PATCH 12/18] libata: reset once Dan Williams
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2012-05-06 18:18 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Dan Carpenter

From: Dan Carpenter <dan.carpenter@oracle.com>

We don't use "dev" any more after 07ec747a5f ("libsas: remove
ata_port.lock management duties from lldds") and it causes a compile
warning.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/mvsas/mv_sas.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
index 7d46c1a..4539d59 100644
--- a/drivers/scsi/mvsas/mv_sas.c
+++ b/drivers/scsi/mvsas/mv_sas.c
@@ -885,7 +885,6 @@ static int mvs_task_exec(struct sas_task *task, const int num, gfp_t gfp_flags,
 				struct completion *completion, int is_tmf,
 				struct mvs_tmf_task *tmf)
 {
-	struct domain_device *dev = task->dev;
 	struct mvs_info *mvi = NULL;
 	u32 rc = 0;
 	u32 pass = 0;


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 12/18] libata: reset once
  2012-05-06 18:17 [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
                   ` (10 preceding siblings ...)
  2012-05-06 18:18 ` [PATCH 11/18] mvsas: remove unused variable in mvs_task_exec() Dan Williams
@ 2012-05-06 18:18 ` Dan Williams
  2012-05-06 18:19 ` [PATCH 13/18] libsas: continue revalidation Dan Williams
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2012-05-06 18:18 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

Hotplug testing with libsas currently encounters a 55 second wait for
link recovery to give up.  In the case where the user trusts the
response time of their devices permit the recovery attempts to be
limited to one.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/kernel-parameters.txt |    3 +++
 drivers/ata/libata-core.c           |    1 +
 drivers/ata/libata-eh.c             |    2 ++
 include/linux/libata.h              |    1 +
 4 files changed, 7 insertions(+)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index c1601e5..9dbcc16 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1319,6 +1319,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			* nohrst, nosrst, norst: suppress hard, soft
                           and both resets.
 
+			* rstonce: only attempt one reset during
+			  hot-unplug link recovery
+
 			* dump_id: dump IDENTIFY data.
 
 			If there are multiple matching configurations changing
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index bd79c8b..171597f 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -6381,6 +6381,7 @@ static int __init ata_parse_force_one(char **cur,
 		{ "nohrst",	.lflags		= ATA_LFLAG_NO_HRST },
 		{ "nosrst",	.lflags		= ATA_LFLAG_NO_SRST },
 		{ "norst",	.lflags		= ATA_LFLAG_NO_HRST | ATA_LFLAG_NO_SRST },
+		{ "rstonce",	.lflags		= ATA_LFLAG_RST_ONCE },
 	};
 	char *start = *cur, *p = *cur;
 	char *id, *val, *endp;
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 4f12f63..b02748b 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2605,6 +2605,8 @@ int ata_eh_reset(struct ata_link *link, int classify,
 	 */
 	while (ata_eh_reset_timeouts[max_tries] != ULONG_MAX)
 		max_tries++;
+	if (link->flags & ATA_LFLAG_RST_ONCE)
+		max_tries = 1;
 	if (link->flags & ATA_LFLAG_NO_HRST)
 		hardreset = NULL;
 	if (link->flags & ATA_LFLAG_NO_SRST)
diff --git a/include/linux/libata.h b/include/linux/libata.h
index a868bbc..0a3c99a 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -182,6 +182,7 @@ enum {
 	ATA_LFLAG_DISABLED	= (1 << 6), /* link is disabled */
 	ATA_LFLAG_SW_ACTIVITY	= (1 << 7), /* keep activity stats */
 	ATA_LFLAG_NO_LPM	= (1 << 8), /* disable LPM on this link */
+	ATA_LFLAG_RST_ONCE	= (1 << 9), /* limit recovery to one reset */
 
 	/* struct ata_port flags */
 	ATA_FLAG_SLAVE_POSS	= (1 << 0), /* host supports slave dev */


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 13/18] libsas: continue revalidation
  2012-05-06 18:17 [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
                   ` (11 preceding siblings ...)
  2012-05-06 18:18 ` [PATCH 12/18] libata: reset once Dan Williams
@ 2012-05-06 18:19 ` Dan Williams
  2012-05-06 18:19 ` [PATCH 14/18] libata: export ata_port suspend/resume infrastructure for sas Dan Williams
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2012-05-06 18:19 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

Continue running revalidation until no more broadcast devices are
discovered.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_expander.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 63d3e59..a4b15c1 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -2114,9 +2114,7 @@ int sas_ex_revalidate_domain(struct domain_device *port_dev)
 	struct domain_device *dev = NULL;
 
 	res = sas_find_bcast_dev(port_dev, &dev);
-	if (res)
-		goto out;
-	if (dev) {
+	while (res == 0 && dev) {
 		struct expander_device *ex = &dev->ex_dev;
 		int i = 0, phy_id;
 
@@ -2128,8 +2126,10 @@ int sas_ex_revalidate_domain(struct domain_device *port_dev)
 			res = sas_rediscover(dev, phy_id);
 			i = phy_id + 1;
 		} while (i < ex->num_phys);
+
+		dev = NULL;
+		res = sas_find_bcast_dev(port_dev, &dev);
 	}
-out:
 	return res;
 }
 


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 14/18] libata: export ata_port suspend/resume infrastructure for sas
  2012-05-06 18:17 [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
                   ` (12 preceding siblings ...)
  2012-05-06 18:19 ` [PATCH 13/18] libsas: continue revalidation Dan Williams
@ 2012-05-06 18:19 ` Dan Williams
  2012-05-06 18:19 ` [PATCH 15/18] libsas: drop sata port multiplier infrastructure Dan Williams
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2012-05-06 18:19 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Jacek Danecki

Reuse ata_port_{suspend|resume}_common for sas.  This path is chosen
over adding coordination between ata-tranport and sas-transport because
libsas wants to revalidate the domain at resume-time at the host level.
It can not validate links have resumed properly until libata has had a
chance to perform its revalidation, and any sane placing of an ata_port
in the sas-transport model would delay it's resumption until after the
host.

Export the common portion of port suspend/resume (bypass pm_runtime),
and allow sas to perform these operations asynchronously (similar to the
libsas async-ata probe implmentation).  Async operation is determined by
having an external, rather than stack based, location for storing the
result of the operation.

Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/ata/libata-core.c |   58 ++++++++++++++++++++++++++++++++++++---------
 include/linux/libata.h    |   11 +++++++++
 2 files changed, 57 insertions(+), 12 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 171597f..a3c6669 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5241,16 +5241,20 @@ bool ata_link_offline(struct ata_link *link)
 #ifdef CONFIG_PM
 static int ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
 			       unsigned int action, unsigned int ehi_flags,
-			       int wait)
+			       int *async)
 {
 	struct ata_link *link;
 	unsigned long flags;
-	int rc;
+	int rc = 0;
 
 	/* Previous resume operation might still be in
 	 * progress.  Wait for PM_PENDING to clear.
 	 */
 	if (ap->pflags & ATA_PFLAG_PM_PENDING) {
+		if (async) {
+			*async = -EAGAIN;
+			return 0;
+		}
 		ata_port_wait_eh(ap);
 		WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
 	}
@@ -5259,10 +5263,10 @@ static int ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
 	spin_lock_irqsave(ap->lock, flags);
 
 	ap->pm_mesg = mesg;
-	if (wait) {
-		rc = 0;
+	if (async)
+		ap->pm_result = async;
+	else
 		ap->pm_result = &rc;
-	}
 
 	ap->pflags |= ATA_PFLAG_PM_PENDING;
 	ata_for_each_link(link, ap, HOST_FIRST) {
@@ -5275,7 +5279,7 @@ static int ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
 	spin_unlock_irqrestore(ap->lock, flags);
 
 	/* wait and check result */
-	if (wait) {
+	if (!async) {
 		ata_port_wait_eh(ap);
 		WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
 	}
@@ -5285,9 +5289,8 @@ static int ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
 
 #define to_ata_port(d) container_of(d, struct ata_port, tdev)
 
-static int ata_port_suspend_common(struct device *dev, pm_message_t mesg)
+static int __ata_port_suspend_common(struct ata_port *ap, pm_message_t mesg, int *async)
 {
-	struct ata_port *ap = to_ata_port(dev);
 	unsigned int ehi_flags = ATA_EHI_QUIET;
 	int rc;
 
@@ -5302,10 +5305,17 @@ static int ata_port_suspend_common(struct device *dev, pm_message_t mesg)
 	if (mesg.event == PM_EVENT_SUSPEND)
 		ehi_flags |= ATA_EHI_NO_AUTOPSY | ATA_EHI_NO_RECOVERY;
 
-	rc = ata_port_request_pm(ap, mesg, 0, ehi_flags, 1);
+	rc = ata_port_request_pm(ap, mesg, 0, ehi_flags, async);
 	return rc;
 }
 
+static int ata_port_suspend_common(struct device *dev, pm_message_t mesg)
+{
+	struct ata_port *ap = to_ata_port(dev);
+
+	return __ata_port_suspend_common(ap, mesg, NULL);
+}
+
 static int ata_port_suspend(struct device *dev)
 {
 	if (pm_runtime_suspended(dev))
@@ -5330,16 +5340,22 @@ static int ata_port_poweroff(struct device *dev)
 	return ata_port_suspend_common(dev, PMSG_HIBERNATE);
 }
 
-static int ata_port_resume_common(struct device *dev)
+static int __ata_port_resume_common(struct ata_port *ap, int *async)
 {
-	struct ata_port *ap = to_ata_port(dev);
 	int rc;
 
 	rc = ata_port_request_pm(ap, PMSG_ON, ATA_EH_RESET,
-		ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET, 1);
+		ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET, async);
 	return rc;
 }
 
+static int ata_port_resume_common(struct device *dev)
+{
+	struct ata_port *ap = to_ata_port(dev);
+
+	return __ata_port_resume_common(ap, NULL);
+}
+
 static int ata_port_resume(struct device *dev)
 {
 	int rc;
@@ -5372,6 +5388,24 @@ static const struct dev_pm_ops ata_port_pm_ops = {
 	.runtime_idle = ata_port_runtime_idle,
 };
 
+/* sas ports don't participate in pm runtime management of ata_ports,
+ * and need to resume ata devices at the domain level, not the per-port
+ * level. sas suspend/resume is async to allow parallel port recovery
+ * since sas has multiple ata_port instances per Scsi_Host.
+ */
+int ata_sas_port_async_suspend(struct ata_port *ap, int *async)
+{
+	return __ata_port_suspend_common(ap, PMSG_SUSPEND, async);
+}
+EXPORT_SYMBOL_GPL(ata_sas_port_async_suspend);
+
+int ata_sas_port_async_resume(struct ata_port *ap, int *async)
+{
+	return __ata_port_resume_common(ap, async);
+}
+EXPORT_SYMBOL_GPL(ata_sas_port_async_resume);
+
+
 /**
  *	ata_host_suspend - suspend host
  *	@host: host to suspend
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 0a3c99a..ae6366f 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -1015,6 +1015,17 @@ extern bool ata_link_offline(struct ata_link *link);
 #ifdef CONFIG_PM
 extern int ata_host_suspend(struct ata_host *host, pm_message_t mesg);
 extern void ata_host_resume(struct ata_host *host);
+extern int ata_sas_port_async_suspend(struct ata_port *ap, int *async);
+extern int ata_sas_port_async_resume(struct ata_port *ap, int *async);
+#else
+static inline int ata_sas_port_async_suspend(struct ata_port *ap, int *async)
+{
+	return 0;
+}
+static inline int ata_sas_port_async_resume(struct ata_port *ap, int *async)
+{
+	return 0;
+}
 #endif
 extern int ata_ratelimit(void);
 extern void ata_msleep(struct ata_port *ap, unsigned int msecs);


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 15/18] libsas: drop sata port multiplier infrastructure
  2012-05-06 18:17 [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
                   ` (13 preceding siblings ...)
  2012-05-06 18:19 ` [PATCH 14/18] libata: export ata_port suspend/resume infrastructure for sas Dan Williams
@ 2012-05-06 18:19 ` Dan Williams
  2012-05-06 18:19 ` [PATCH 16/18] scsi, sd: limit the scope of the async probe domain Dan Williams
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2012-05-06 18:19 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

On the way to add a new sata_device field, noticed that libsas is
carrying port multiplier infrastructure that is explicitly disabled by
sas_discover_sata().  The aic94xx touches the unused port_no, so leave
that field in case there was some use for it.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_discover.c |    6 ------
 include/scsi/libsas.h              |    1 -
 2 files changed, 7 deletions(-)

diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index b031d23..3e9dc1a 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -46,12 +46,6 @@ void sas_init_dev(struct domain_device *dev)
 		INIT_LIST_HEAD(&dev->ex_dev.children);
 		mutex_init(&dev->ex_dev.cmd_mutex);
 		break;
-	case SATA_DEV:
-	case SATA_PM:
-	case SATA_PM_PORT:
-	case SATA_PENDING:
-		INIT_LIST_HEAD(&dev->sata_dev.children);
-		break;
 	default:
 		break;
 	}
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 996ee7c..066c0ff 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -167,7 +167,6 @@ struct sata_device {
         enum   ata_command_set command_set;
         struct smp_resp        rps_resp; /* report_phy_sata_resp */
         u8     port_no;        /* port number, if this is a PM (Port) */
-        struct list_head children; /* PM Ports if this is a PM */
 
 	struct ata_port *ap;
 	struct ata_host ata_host;


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 16/18] scsi, sd: limit the scope of the async probe domain
  2012-05-06 18:17 [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
                   ` (14 preceding siblings ...)
  2012-05-06 18:19 ` [PATCH 15/18] libsas: drop sata port multiplier infrastructure Dan Williams
@ 2012-05-06 18:19 ` Dan Williams
  2012-05-06 18:19 ` [PATCH 17/18] libsas: suspend / resume support Dan Williams
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2012-05-06 18:19 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Alan Stern

sd injects and synchronizes probe work on the global kernel-wide domain.
This runs into conflict with PM that wants to perform resume actions in
async context:

[  494.237079] INFO: task kworker/u:3:554 blocked for more than 120 seconds.
[  494.294396] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  494.360809] kworker/u:3     D 0000000000000000     0   554      2 0x00000000
[  494.420739]  ffff88012e4d3af0 0000000000000046 ffff88013200c160 ffff88012e4d3fd8
[  494.484392]  ffff88012e4d3fd8 0000000000012500 ffff8801394ea0b0 ffff88013200c160
[  494.548038]  ffff88012e4d3ae0 00000000000001e3 ffffffff81a249e0 ffff8801321c5398
[  494.611685] Call Trace:
[  494.632649]  [<ffffffff8149dd25>] schedule+0x5a/0x5c
[  494.674687]  [<ffffffff8104b968>] async_synchronize_cookie_domain+0xb6/0x112
[  494.734177]  [<ffffffff810461ff>] ? __init_waitqueue_head+0x50/0x50
[  494.787134]  [<ffffffff8131a224>] ? scsi_remove_target+0x48/0x48
[  494.837900]  [<ffffffff8104b9d9>] async_synchronize_cookie+0x15/0x17
[  494.891567]  [<ffffffff8104ba49>] async_synchronize_full+0x54/0x70  <-- here we wait for async contexts to complete
[  494.943783]  [<ffffffff8104b9f5>] ? async_synchronize_full_domain+0x1a/0x1a
[  495.002547]  [<ffffffffa00114b1>] sd_remove+0x2c/0xa2 [sd_mod]
[  495.051861]  [<ffffffff812fe94f>] __device_release_driver+0x86/0xcf
[  495.104807]  [<ffffffff812fe9bd>] device_release_driver+0x25/0x32  <-- here we take device_lock()

[  853.511341] INFO: task kworker/u:4:549 blocked for more than 120 seconds.
[  853.568693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  853.635119] kworker/u:4     D ffff88013097b5d0     0   549      2 0x00000000
[  853.695129]  ffff880132773c40 0000000000000046 ffff880130790000 ffff880132773fd8
[  853.758990]  ffff880132773fd8 0000000000012500 ffff88013288a0b0 ffff880130790000
[  853.822796]  0000000000000246 0000000000000040 ffff88013097b5c8 ffff880130790000
[  853.886633] Call Trace:
[  853.907631]  [<ffffffff8149dd25>] schedule+0x5a/0x5c
[  853.949670]  [<ffffffff8149cc44>] __mutex_lock_common+0x220/0x351
[  854.001225]  [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4
[  854.049082]  [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4
[  854.097011]  [<ffffffff8149ce48>] mutex_lock_nested+0x2f/0x36   <-- here we wait for device_lock()
[  854.145591]  [<ffffffff81304bd7>] device_resume+0x58/0x1c4
[  854.192066]  [<ffffffff81304d61>] async_resume+0x1e/0x45
[  854.237019]  [<ffffffff8104bc93>] async_run_entry_fn+0xc6/0x173  <-- ...while running in async context

Provide a 'scsi_sd_probe_domain' so that async probe actions actions can
be flushed without regard for the state of PM, and allow for the resume
path to handle devices that have transitioned from SDEV_QUIESCE to
SDEV_DEL prior to resume.

Acked-by: Alan Stern <stern@rowland.harvard.edu>
[alan: uplevel scsi_sd_probe_domain, clarify scsi_device_resume]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/scsi.c      |    4 ++++
 drivers/scsi/scsi_lib.c  |   10 +++++++---
 drivers/scsi/scsi_pm.c   |    2 +-
 drivers/scsi/scsi_priv.h |    2 ++
 drivers/scsi/sd.c        |    5 +++--
 5 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 07322ec..2ecdb01 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -90,6 +90,10 @@ unsigned int scsi_logging_level;
 EXPORT_SYMBOL(scsi_logging_level);
 #endif
 
+/* sd and scsi_pm need to coordinate flushing async actions */
+LIST_HEAD(scsi_sd_probe_domain);
+EXPORT_SYMBOL(scsi_sd_probe_domain);
+
 /* NB: These are exposed through /proc/scsi/scsi and form part of the ABI.
  * You may not alter any existing entry (although adding new ones is
  * encouraged once assigned by ANSI/INCITS T10
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 5dfd749..62ddfd3 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2348,10 +2348,14 @@ EXPORT_SYMBOL(scsi_device_quiesce);
  *
  *	Must be called with user context, may sleep.
  */
-void
-scsi_device_resume(struct scsi_device *sdev)
+void scsi_device_resume(struct scsi_device *sdev)
 {
-	if(scsi_device_set_state(sdev, SDEV_RUNNING))
+	/* check if the device state was mutated prior to resume, and if
+	 * so assume the state is being managed elsewhere (for example
+	 * device deleted during suspend)
+	 */
+	if (sdev->sdev_state != SDEV_QUIESCE ||
+	    scsi_device_set_state(sdev, SDEV_RUNNING))
 		return;
 	scsi_run_queue(sdev->request_queue);
 }
diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
index c467064..f661a41 100644
--- a/drivers/scsi/scsi_pm.c
+++ b/drivers/scsi/scsi_pm.c
@@ -97,7 +97,7 @@ static int scsi_bus_prepare(struct device *dev)
 {
 	if (scsi_is_sdev_device(dev)) {
 		/* sd probing uses async_schedule.  Wait until it finishes. */
-		async_synchronize_full();
+		async_synchronize_full_domain(&scsi_sd_probe_domain);
 
 	} else if (scsi_is_host_device(dev)) {
 		/* Wait until async scanning is finished */
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index be4fa6d..07ce3f5 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -163,6 +163,8 @@ static inline int scsi_autopm_get_host(struct Scsi_Host *h) { return 0; }
 static inline void scsi_autopm_put_host(struct Scsi_Host *h) {}
 #endif /* CONFIG_PM_RUNTIME */
 
+extern struct list_head scsi_sd_probe_domain;
+
 /* 
  * internal scsi timeout functions: for use by mid-layer and transport
  * classes.
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 5ba5c2a..6f0a4c6 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -65,6 +65,7 @@
 #include <scsi/scsicam.h>
 
 #include "sd.h"
+#include "scsi_priv.h"
 #include "scsi_logging.h"
 
 MODULE_AUTHOR("Eric Youngdale");
@@ -2722,7 +2723,7 @@ static int sd_probe(struct device *dev)
 	dev_set_drvdata(dev, sdkp);
 
 	get_device(&sdkp->dev);	/* prevent release before async_schedule */
-	async_schedule(sd_probe_async, sdkp);
+	async_schedule_domain(sd_probe_async, sdkp, &scsi_sd_probe_domain);
 
 	return 0;
 
@@ -2756,7 +2757,7 @@ static int sd_remove(struct device *dev)
 	sdkp = dev_get_drvdata(dev);
 	scsi_autopm_get_device(sdkp->device);
 
-	async_synchronize_full();
+	async_synchronize_full_domain(&scsi_sd_probe_domain);
 	blk_queue_prep_rq(sdkp->device->request_queue, scsi_prep_fn);
 	blk_queue_unprep_rq(sdkp->device->request_queue, NULL);
 	device_del(&sdkp->dev);


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 17/18] libsas: suspend / resume support
  2012-05-06 18:17 [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
                   ` (15 preceding siblings ...)
  2012-05-06 18:19 ` [PATCH 16/18] scsi, sd: limit the scope of the async probe domain Dan Williams
@ 2012-05-06 18:19 ` Dan Williams
  2012-05-06 18:19 ` [PATCH 18/18] scsi: cleanup setting task state in scsi_error_handler() Dan Williams
  2012-05-31 18:12 ` [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
  18 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2012-05-06 18:19 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Alan Stern, Maciej Patelczyk, Jacek Danecki

libsas power management routines to suspend and recover the sas domain
based on a model where the lldd is allowed and expected to be
"forgetful".

sas_suspend_ha - disable event processing allowing the lldd to take down
                 links without concern for causing hotplug events.
                 Regardless of whether the lldd actually posts link down
                 messages libsas notifies the lldd that all
                 domain_devices are gone.

sas_prep_resume_ha - on the way back up before the lldd starts link
                     training clean out any spurious events that were
                     generated on the way down, and re-enable event
                     processing

sas_resume_ha - after the lldd has started and decided that all phys
		have posted link-up events this routine is called to let
		libsas start it's own timeout of any phys that did not
		resume.  After the timeout an lldd can cancel the
                phy teardown by posting a link-up event.

Storage for ex_change_count (u16) and phy_change_count (u8) are changed
to int so they can be set to -1 to indicate 'invalidated'.

Cc: Alan Stern <stern@rowland.harvard.edu>
Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
Tested-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_ata.c      |   86 ++++++++++++++++++++++++++++++++++
 drivers/scsi/libsas/sas_discover.c |   69 ++++++++++++++++++++++++----
 drivers/scsi/libsas/sas_dump.c     |    1 
 drivers/scsi/libsas/sas_event.c    |    4 +-
 drivers/scsi/libsas/sas_init.c     |   90 ++++++++++++++++++++++++++++++++++++
 drivers/scsi/libsas/sas_internal.h |    1 
 drivers/scsi/libsas/sas_phy.c      |   21 ++++++++
 drivers/scsi/libsas/sas_port.c     |   52 ++++++++++++++++++++-
 include/scsi/libsas.h              |   20 ++++++--
 include/scsi/sas_ata.h             |   10 ++++
 10 files changed, 335 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 6a6c80f..607a35b 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -700,6 +700,92 @@ void sas_probe_sata(struct asd_sas_port *port)
 		if (ata_dev_disabled(sas_to_ata_dev(dev)))
 			sas_fail_probe(dev, __func__, -ENODEV);
 	}
+
+}
+
+static bool sas_ata_flush_pm_eh(struct asd_sas_port *port, const char *func)
+{
+	struct domain_device *dev, *n;
+	bool retry = false;
+
+	list_for_each_entry_safe(dev, n, &port->dev_list, dev_list_node) {
+		int rc;
+
+		if (!dev_is_sata(dev))
+			continue;
+
+		sas_ata_wait_eh(dev);
+		rc = dev->sata_dev.pm_result;
+		if (rc == -EAGAIN)
+			retry = true;
+		else if (rc) {
+			/* since we don't have a
+			 * ->port_{suspend|resume} routine in our
+			 *  ata_port ops, and no entanglements with
+			 *  acpi, suspend should just be mechanical trip
+			 *  through eh, catch cases where these
+			 *  assumptions are invalidated
+			 */
+			WARN_ONCE(1, "failed %s %s error: %d\n", func,
+				 dev_name(&dev->rphy->dev), rc);
+		}
+
+		/* if libata failed to power manage the device, tear it down */
+		if (ata_dev_disabled(sas_to_ata_dev(dev)))
+			sas_fail_probe(dev, func, -ENODEV);
+	}
+
+	return retry;
+}
+
+void sas_suspend_sata(struct asd_sas_port *port)
+{
+	struct domain_device *dev;
+
+ retry:
+	mutex_lock(&port->ha->disco_mutex);
+	list_for_each_entry(dev, &port->dev_list, dev_list_node) {
+		struct sata_device *sata;
+
+		if (!dev_is_sata(dev))
+			continue;
+
+		sata = &dev->sata_dev;
+		if (sata->ap->pm_mesg.event == PM_EVENT_SUSPEND)
+			continue;
+
+		sata->pm_result = -EIO;
+		ata_sas_port_async_suspend(sata->ap, &sata->pm_result);
+	}
+	mutex_unlock(&port->ha->disco_mutex);
+
+	if (sas_ata_flush_pm_eh(port, __func__))
+		goto retry;
+}
+
+void sas_resume_sata(struct asd_sas_port *port)
+{
+	struct domain_device *dev;
+
+ retry:
+	mutex_lock(&port->ha->disco_mutex);
+	list_for_each_entry(dev, &port->dev_list, dev_list_node) {
+		struct sata_device *sata;
+
+		if (!dev_is_sata(dev))
+			continue;
+
+		sata = &dev->sata_dev;
+		if (sata->ap->pm_mesg.event == PM_EVENT_ON)
+			continue;
+
+		sata->pm_result = -EIO;
+		ata_sas_port_async_resume(sata->ap, &sata->pm_result);
+	}
+	mutex_unlock(&port->ha->disco_mutex);
+
+	if (sas_ata_flush_pm_eh(port, __func__))
+		goto retry;
 }
 
 /**
diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index 3e9dc1a..a0c3003 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -24,6 +24,7 @@
 
 #include <linux/scatterlist.h>
 #include <linux/slab.h>
+#include <linux/async.h>
 #include <scsi/scsi_host.h>
 #include <scsi/scsi_eh.h>
 #include "sas_internal.h"
@@ -180,16 +181,18 @@ int sas_notify_lldd_dev_found(struct domain_device *dev)
 	struct Scsi_Host *shost = sas_ha->core.shost;
 	struct sas_internal *i = to_sas_internal(shost->transportt);
 
-	if (i->dft->lldd_dev_found) {
-		res = i->dft->lldd_dev_found(dev);
-		if (res) {
-			printk("sas: driver on pcidev %s cannot handle "
-			       "device %llx, error:%d\n",
-			       dev_name(sas_ha->dev),
-			       SAS_ADDR(dev->sas_addr), res);
-		}
-		kref_get(&dev->kref);
+	if (!i->dft->lldd_dev_found)
+		return 0;
+
+	res = i->dft->lldd_dev_found(dev);
+	if (res) {
+		printk("sas: driver on pcidev %s cannot handle "
+		       "device %llx, error:%d\n",
+		       dev_name(sas_ha->dev),
+		       SAS_ADDR(dev->sas_addr), res);
 	}
+	set_bit(SAS_DEV_FOUND, &dev->state);
+	kref_get(&dev->kref);
 	return res;
 }
 
@@ -200,7 +203,10 @@ void sas_notify_lldd_dev_gone(struct domain_device *dev)
 	struct Scsi_Host *shost = sas_ha->core.shost;
 	struct sas_internal *i = to_sas_internal(shost->transportt);
 
-	if (i->dft->lldd_dev_gone) {
+	if (!i->dft->lldd_dev_gone)
+		return;
+
+	if (test_and_clear_bit(SAS_DEV_FOUND, &dev->state)) {
 		i->dft->lldd_dev_gone(dev);
 		sas_put_device(dev);
 	}
@@ -234,6 +240,47 @@ static void sas_probe_devices(struct work_struct *work)
 	}
 }
 
+static void sas_suspend_devices(struct work_struct *work)
+{
+	struct asd_sas_phy *phy;
+	struct domain_device *dev;
+	struct sas_discovery_event *ev = to_sas_discovery_event(work);
+	struct asd_sas_port *port = ev->port;
+	struct Scsi_Host *shost = port->ha->core.shost;
+	struct sas_internal *si = to_sas_internal(shost->transportt);
+
+	clear_bit(DISCE_SUSPEND, &port->disc.pending);
+
+	sas_suspend_sata(port);
+
+	/* lldd is free to forget the domain_device across the
+	 * suspension, we force the issue here to keep the reference
+	 * counts aligned
+	 */
+	list_for_each_entry(dev, &port->dev_list, dev_list_node)
+		sas_notify_lldd_dev_gone(dev);
+
+	/* we are suspending, so we know events are disabled and
+	 * phy_list is not being mutated
+	 */
+	list_for_each_entry(phy, &port->phy_list, port_phy_el) {
+		if (si->dft->lldd_port_formed)
+			si->dft->lldd_port_deformed(phy);
+		phy->suspended = 1;
+		port->suspended = 1;
+	}
+}
+
+static void sas_resume_devices(struct work_struct *work)
+{
+	struct sas_discovery_event *ev = to_sas_discovery_event(work);
+	struct asd_sas_port *port = ev->port;
+
+	clear_bit(DISCE_RESUME, &port->disc.pending);
+
+	sas_resume_sata(port);
+}
+
 /**
  * sas_discover_end_dev -- discover an end device (SSP, etc)
  * @end: pointer to domain device of interest
@@ -530,6 +577,8 @@ void sas_init_disc(struct sas_discovery *disc, struct asd_sas_port *port)
 		[DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
 		[DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
 		[DISCE_PROBE] = sas_probe_devices,
+		[DISCE_SUSPEND] = sas_suspend_devices,
+		[DISCE_RESUME] = sas_resume_devices,
 		[DISCE_DESTRUCT] = sas_destruct_devices,
 	};
 
diff --git a/drivers/scsi/libsas/sas_dump.c b/drivers/scsi/libsas/sas_dump.c
index fc46093..cd6f99c 100644
--- a/drivers/scsi/libsas/sas_dump.c
+++ b/drivers/scsi/libsas/sas_dump.c
@@ -41,6 +41,7 @@ static const char *sas_phye_str[] = {
 	[1] = "PHYE_OOB_DONE",
 	[2] = "PHYE_OOB_ERROR",
 	[3] = "PHYE_SPINUP_HOLD",
+	[4] = "PHYE_RESUME_TIMEOUT",
 };
 
 void sas_dprint_porte(int phyid, enum port_event pe)
diff --git a/drivers/scsi/libsas/sas_event.c b/drivers/scsi/libsas/sas_event.c
index 789c4d8..aadbd53 100644
--- a/drivers/scsi/libsas/sas_event.c
+++ b/drivers/scsi/libsas/sas_event.c
@@ -134,7 +134,7 @@ static void notify_port_event(struct asd_sas_phy *phy, enum port_event event)
 			&phy->port_events[event].work, ha);
 }
 
-static void notify_phy_event(struct asd_sas_phy *phy, enum phy_event event)
+void sas_notify_phy_event(struct asd_sas_phy *phy, enum phy_event event)
 {
 	struct sas_ha_struct *ha = phy->ha;
 
@@ -159,7 +159,7 @@ int sas_init_events(struct sas_ha_struct *sas_ha)
 
 	sas_ha->notify_ha_event = notify_ha_event;
 	sas_ha->notify_port_event = notify_port_event;
-	sas_ha->notify_phy_event = notify_phy_event;
+	sas_ha->notify_phy_event = sas_notify_phy_event;
 
 	return 0;
 }
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index 014297c..dbc8a79 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -178,7 +178,7 @@ Undo_phys:
 	return error;
 }
 
-int sas_unregister_ha(struct sas_ha_struct *sas_ha)
+static void sas_disable_events(struct sas_ha_struct *sas_ha)
 {
 	/* Set the state to unregistered to avoid further unchained
 	 * events to be queued, and flush any in-progress drainers
@@ -189,7 +189,11 @@ int sas_unregister_ha(struct sas_ha_struct *sas_ha)
 	spin_unlock_irq(&sas_ha->lock);
 	__sas_drain_work(sas_ha);
 	mutex_unlock(&sas_ha->drain_mutex);
+}
 
+int sas_unregister_ha(struct sas_ha_struct *sas_ha)
+{
+	sas_disable_events(sas_ha);
 	sas_unregister_ports(sas_ha);
 
 	/* flush unregistration work */
@@ -381,6 +385,90 @@ int sas_set_phy_speed(struct sas_phy *phy,
 	return ret;
 }
 
+void sas_prep_resume_ha(struct sas_ha_struct *ha)
+{
+	int i;
+
+	set_bit(SAS_HA_REGISTERED, &ha->state);
+
+	/* clear out any stale link events/data from the suspension path */
+	for (i = 0; i < ha->num_phys; i++) {
+		struct asd_sas_phy *phy = ha->sas_phy[i];
+
+		memset(phy->attached_sas_addr, 0, SAS_ADDR_SIZE);
+		phy->port_events_pending = 0;
+		phy->phy_events_pending = 0;
+		phy->frame_rcvd_size = 0;
+	}
+}
+EXPORT_SYMBOL(sas_prep_resume_ha);
+
+static int phys_suspended(struct sas_ha_struct *ha)
+{
+	int i, rc = 0;
+
+	for (i = 0; i < ha->num_phys; i++) {
+		struct asd_sas_phy *phy = ha->sas_phy[i];
+
+		if (phy->suspended)
+			rc++;
+	}
+
+	return rc;
+}
+
+void sas_resume_ha(struct sas_ha_struct *ha)
+{
+	const unsigned long tmo = msecs_to_jiffies(25000);
+	int i;
+
+	/* deform ports on phys that did not resume
+	 * at this point we may be racing the phy coming back (as posted
+	 * by the lldd).  So we post the event and once we are in the
+	 * libsas context check that the phy remains suspended before
+	 * tearing it down.
+	 */
+	i = phys_suspended(ha);
+	if (i)
+		dev_info(ha->dev, "waiting up to 25 seconds for %d phy%s to resume\n",
+			 i, i > 1 ? "s" : "");
+	wait_event_timeout(ha->eh_wait_q, phys_suspended(ha) == 0, tmo);
+	for (i = 0; i < ha->num_phys; i++) {
+		struct asd_sas_phy *phy = ha->sas_phy[i];
+
+		if (phy->suspended) {
+			dev_warn(&phy->phy->dev, "resume timeout\n");
+			sas_notify_phy_event(phy, PHYE_RESUME_TIMEOUT);
+		}
+	}
+
+	/* all phys are back up or timed out, turn on i/o so we can
+	 * flush out disks that did not return
+	 */
+	scsi_unblock_requests(ha->core.shost);
+	sas_drain_work(ha);
+}
+EXPORT_SYMBOL(sas_resume_ha);
+
+void sas_suspend_ha(struct sas_ha_struct *ha)
+{
+	int i;
+
+	sas_disable_events(ha);
+	scsi_block_requests(ha->core.shost);
+	for (i = 0; i < ha->num_phys; i++) {
+		struct asd_sas_port *port = ha->sas_port[i];
+
+		sas_discover_event(port, DISCE_SUSPEND);
+	}
+
+	/* flush suspend events while unregistered */
+	mutex_lock(&ha->drain_mutex);
+	__sas_drain_work(ha);
+	mutex_unlock(&ha->drain_mutex);
+}
+EXPORT_SYMBOL(sas_suspend_ha);
+
 static void sas_phy_release(struct sas_phy *phy)
 {
 	kfree(phy->hostdata);
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index 507e4cf..1de6796 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -89,6 +89,7 @@ int sas_smp_phy_control(struct domain_device *dev, int phy_id,
 			enum phy_func phy_func, struct sas_phy_linkrates *);
 int sas_smp_get_phy_events(struct sas_phy *phy);
 
+void sas_notify_phy_event(struct asd_sas_phy *phy, enum phy_event event);
 void sas_device_set_phy(struct domain_device *dev, struct sas_port *port);
 struct domain_device *sas_find_dev_by_rphy(struct sas_rphy *rphy);
 struct domain_device *sas_ex_to_ata(struct domain_device *ex_dev, int phy_id);
diff --git a/drivers/scsi/libsas/sas_phy.c b/drivers/scsi/libsas/sas_phy.c
index 521422e..cdee446 100644
--- a/drivers/scsi/libsas/sas_phy.c
+++ b/drivers/scsi/libsas/sas_phy.c
@@ -94,6 +94,25 @@ static void sas_phye_spinup_hold(struct work_struct *work)
 	i->dft->lldd_control_phy(phy, PHY_FUNC_RELEASE_SPINUP_HOLD, NULL);
 }
 
+static void sas_phye_resume_timeout(struct work_struct *work)
+{
+	struct asd_sas_event *ev = to_asd_sas_event(work);
+	struct asd_sas_phy *phy = ev->phy;
+
+	clear_bit(PHYE_RESUME_TIMEOUT, &phy->phy_events_pending);
+
+	/* phew, lldd got the phy back in the nick of time */
+	if (!phy->suspended) {
+		dev_info(&phy->phy->dev, "resume timeout cancelled\n");
+		return;
+	}
+
+	phy->error = 0;
+	phy->suspended = 0;
+	sas_deform_port(phy, 1);
+}
+
+
 /* ---------- Phy class registration ---------- */
 
 int sas_register_phys(struct sas_ha_struct *sas_ha)
@@ -105,6 +124,8 @@ int sas_register_phys(struct sas_ha_struct *sas_ha)
 		[PHYE_OOB_DONE] = sas_phye_oob_done,
 		[PHYE_OOB_ERROR] = sas_phye_oob_error,
 		[PHYE_SPINUP_HOLD] = sas_phye_spinup_hold,
+		[PHYE_RESUME_TIMEOUT] = sas_phye_resume_timeout,
+
 	};
 
 	static const work_func_t sas_port_event_fns[PORT_NUM_EVENTS] = {
diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
index e884a8c..1398b71 100644
--- a/drivers/scsi/libsas/sas_port.c
+++ b/drivers/scsi/libsas/sas_port.c
@@ -39,6 +39,49 @@ static bool phy_is_wideport_member(struct asd_sas_port *port, struct asd_sas_phy
 	return true;
 }
 
+static void sas_resume_port(struct asd_sas_phy *phy)
+{
+	struct domain_device *dev;
+	struct asd_sas_port *port = phy->port;
+	struct sas_ha_struct *sas_ha = phy->ha;
+	struct sas_internal *si = to_sas_internal(sas_ha->core.shost->transportt);
+
+	if (si->dft->lldd_port_formed)
+		si->dft->lldd_port_formed(phy);
+
+	if (port->suspended)
+		port->suspended = 0;
+	else {
+		/* we only need to handle "link returned" actions once */
+		return;
+	}
+
+	/* if the port came back:
+	 * 1/ presume every device came back
+	 * 2/ force the next revalidation to check all expander phys
+	 */
+	list_for_each_entry(dev, &port->dev_list, dev_list_node) {
+		int i, rc;
+
+		rc = sas_notify_lldd_dev_found(dev);
+		if (rc) {
+			sas_unregister_dev(port, dev);
+			continue;
+		}
+
+		if (dev->dev_type == EDGE_DEV || dev->dev_type == FANOUT_DEV) {
+			dev->ex_dev.ex_change_count = -1;
+			for (i = 0; i < dev->ex_dev.num_phys; i++) {
+				struct ex_phy *phy = &dev->ex_dev.ex_phy[i];
+
+				phy->phy_change_count = -1;
+			}
+		}
+	}
+
+	sas_discover_event(port, DISCE_RESUME);
+}
+
 /**
  * sas_form_port -- add this phy to a port
  * @phy: the phy of interest
@@ -58,7 +101,14 @@ static void sas_form_port(struct asd_sas_phy *phy)
 	if (port) {
 		if (!phy_is_wideport_member(port, phy))
 			sas_deform_port(phy, 0);
-		else {
+		else if (phy->suspended) {
+			phy->suspended = 0;
+			sas_resume_port(phy);
+
+			/* phy came back, try to cancel the timeout */
+			wake_up(&sas_ha->eh_wait_q);
+			return;
+		} else {
 			SAS_DPRINTK("%s: phy%d belongs to port%d already(%d)!\n",
 				    __func__, phy->id, phy->port->id,
 				    phy->port->num_phys);
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 066c0ff..5c7677f 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -79,7 +79,8 @@ enum phy_event {
 	PHYE_OOB_DONE         = 1,
 	PHYE_OOB_ERROR        = 2,
 	PHYE_SPINUP_HOLD      = 3, /* hot plug SATA, no COMWAKE sent */
-	PHY_NUM_EVENTS        = 4,
+	PHYE_RESUME_TIMEOUT   = 4,
+	PHY_NUM_EVENTS        = 5,
 };
 
 enum discover_event {
@@ -87,8 +88,10 @@ enum discover_event {
 	DISCE_REVALIDATE_DOMAIN = 1,
 	DISCE_PORT_GONE         = 2,
 	DISCE_PROBE		= 3,
-	DISCE_DESTRUCT		= 4,
-	DISC_NUM_EVENTS		= 5,
+	DISCE_SUSPEND		= 4,
+	DISCE_RESUME		= 5,
+	DISCE_DESTRUCT		= 6,
+	DISC_NUM_EVENTS		= 7,
 };
 
 /* ---------- Expander Devices ---------- */
@@ -128,7 +131,7 @@ struct ex_phy {
 	u8   attached_sas_addr[SAS_ADDR_SIZE];
 	u8   attached_phy_id;
 
-	u8   phy_change_count;
+	int phy_change_count;
 	enum routing_attribute routing_attr;
 	u8   virtual:1;
 
@@ -141,7 +144,7 @@ struct ex_phy {
 struct expander_device {
 	struct list_head children;
 
-	u16    ex_change_count;
+	int    ex_change_count;
 	u16    max_route_indexes;
 	u8     num_phys;
 
@@ -167,6 +170,7 @@ struct sata_device {
         enum   ata_command_set command_set;
         struct smp_resp        rps_resp; /* report_phy_sata_resp */
         u8     port_no;        /* port number, if this is a PM (Port) */
+	int    pm_result;
 
 	struct ata_port *ap;
 	struct ata_host ata_host;
@@ -180,6 +184,7 @@ struct ssp_device {
 
 enum {
 	SAS_DEV_GONE,
+	SAS_DEV_FOUND, /* device notified to lldd */
 	SAS_DEV_DESTROY,
 	SAS_DEV_EH_PENDING,
 	SAS_DEV_LU_RESET,
@@ -271,6 +276,7 @@ struct asd_sas_port {
 	enum   sas_linkrate linkrate;
 
 	struct sas_work work;
+	int suspended;
 
 /* public: */
 	int id;
@@ -319,6 +325,7 @@ struct asd_sas_phy {
 	unsigned long phy_events_pending;
 
 	int error;
+	int suspended;
 
 	struct sas_phy *phy;
 
@@ -685,6 +692,9 @@ struct sas_domain_function_template {
 
 extern int sas_register_ha(struct sas_ha_struct *);
 extern int sas_unregister_ha(struct sas_ha_struct *);
+extern void sas_prep_resume_ha(struct sas_ha_struct *sas_ha);
+extern void sas_resume_ha(struct sas_ha_struct *sas_ha);
+extern void sas_suspend_ha(struct sas_ha_struct *sas_ha);
 
 int sas_set_phy_speed(struct sas_phy *phy,
 		      struct sas_phy_linkrates *rates);
diff --git a/include/scsi/sas_ata.h b/include/scsi/sas_ata.h
index 2dfbdaa..ff71a56 100644
--- a/include/scsi/sas_ata.h
+++ b/include/scsi/sas_ata.h
@@ -45,6 +45,8 @@ void sas_ata_eh(struct Scsi_Host *shost, struct list_head *work_q,
 void sas_ata_schedule_reset(struct domain_device *dev);
 void sas_ata_wait_eh(struct domain_device *dev);
 void sas_probe_sata(struct asd_sas_port *port);
+void sas_suspend_sata(struct asd_sas_port *port);
+void sas_resume_sata(struct asd_sas_port *port);
 void sas_ata_end_eh(struct ata_port *ap);
 #else
 
@@ -82,6 +84,14 @@ static inline void sas_probe_sata(struct asd_sas_port *port)
 {
 }
 
+static inline void sas_suspend_sata(struct asd_sas_port *port)
+{
+}
+
+static inline void sas_resume_sata(struct asd_sas_port *port)
+{
+}
+
 static inline int sas_get_ata_info(struct domain_device *dev, struct ex_phy *phy)
 {
 	return 0;


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 18/18] scsi: cleanup setting task state in scsi_error_handler()
  2012-05-06 18:17 [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
                   ` (16 preceding siblings ...)
  2012-05-06 18:19 ` [PATCH 17/18] libsas: suspend / resume support Dan Williams
@ 2012-05-06 18:19 ` Dan Williams
  2012-05-31 18:12 ` [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
  18 siblings, 0 replies; 21+ messages in thread
From: Dan Williams @ 2012-05-06 18:19 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

Reading scsi_error_handler() one could easily come away with the
impression that it does its wakeup event check while the task state is
TASK_RUNNING.  In fact it sets TASK_INTERRUPTIBLE at the bottom of the
loop, but that is ~50 lines down.

Just set TASK_INTERRUPTIBLE at the top of loop and be done.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/scsi_error.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index cc8dc8c..b409b6f 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1818,15 +1818,14 @@ int scsi_error_handler(void *data)
 	 * We never actually get interrupted because kthread_run
 	 * disables signal delivery for the created thread.
 	 */
-	set_current_state(TASK_INTERRUPTIBLE);
 	while (!kthread_should_stop()) {
+		set_current_state(TASK_INTERRUPTIBLE);
 		if ((shost->host_failed == 0 && shost->host_eh_scheduled == 0) ||
 		    shost->host_failed != shost->host_busy) {
 			SCSI_LOG_ERROR_RECOVERY(1,
 				printk("Error handler scsi_eh_%d sleeping\n",
 					shost->host_no));
 			schedule();
-			set_current_state(TASK_INTERRUPTIBLE);
 			continue;
 		}
 
@@ -1863,9 +1862,7 @@ int scsi_error_handler(void *data)
 		scsi_restart_operations(shost);
 		if (!shost->eh_noresume)
 			scsi_autopm_put_host(shost);
-		set_current_state(TASK_INTERRUPTIBLE);
 	}
-	__set_current_state(TASK_RUNNING);
 
 	SCSI_LOG_ERROR_RECOVERY(1,
 		printk("Error handler scsi_eh_%d exiting\n", shost->host_no));


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 00/18] libsas, sas_ata: update for 3.5
  2012-05-06 18:17 [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
                   ` (17 preceding siblings ...)
  2012-05-06 18:19 ` [PATCH 18/18] scsi: cleanup setting task state in scsi_error_handler() Dan Williams
@ 2012-05-31 18:12 ` Dan Williams
  2012-06-01  4:50   ` Jack Wang
  18 siblings, 1 reply; 21+ messages in thread
From: Dan Williams @ 2012-05-31 18:12 UTC (permalink / raw)
  To: linux-scsi, JBottomley, Jeff Garzik; +Cc: linux-ide, Jack Wang

On Sun, May 6, 2012 at 11:17 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> 1/ libsas suspend/resume support
>
> 2/ finish off the conversion of the strategy handlers to enforce only
>   invoking them in eh context (to meet libata's expectations).
>
> 3/ a collection of discovery fixes, error handling fixes, and
>   miscellaneous cleanups
>
> All of these patches, save for patch 18, are on their 2nd or 3rd resend.
> These have been soaking in -next since before the 3.4 merge window
> opened.

...and it seems these will get to sit out another kernel

Please comment on these, I'd like to know which are NAK'd vs just ran
out of time to be integrated this round.  Jeff, some comments about
patch 12 and 14 below for you.

Can we at least get libsas suspend / resume support applied for 3.5?
If that's agreeable I'll put together a pared down pull-request??

> [PATCH 01/18] libsas: cleanup spurious calls to scsi_schedule_eh

A misuse of scsi_schedule_eh(), part of the effort to make sure
host_eh_scheduled is managed properly, but for now it's just cleanup
material because I have not root caused a test failure back to this
misuse.

> [PATCH 02/18] libata, libsas: introduce sched_eh and end_eh port ops

This was acked-by Jeff, not sure what the hold up is.

> [PATCH 03/18] scsi: fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations)

You had comments about this patch, I responded and then the
communication stopped.

> [PATCH 04/18] scsi_transport_sas: fix delete vs scan race

I have a new version under test after review feedback from Mike.

> [PATCH 05/18] libsas: enforce eh strategy handlers only in eh context
> [PATCH 06/18] libsas: add sas_eh_abort_handler
> [PATCH 07/18] libsas: use ->lldd_I_T_nexus_reset for ->eh_bus_reset_handler
> [PATCH 08/18] isci: use sas eh strategy handlers

These are fixups to make sure the eh strategy handlers are called in
the proper context and that tasks are not terminated outside of
visibilty of the lldd.

> [PATCH 09/18] libsas: trim sas_task of slow path infrastructure
> [PATCH 10/18] libsas: sas_rediscover_dev did not look at the SMP exec status.
> [PATCH 11/18] mvsas: remove unused variable in mvs_task_exec()

Minor clean up and fixlets.

> [PATCH 12/18] libata: reset once

Jeff, any objections on this one?

> [PATCH 13/18] libsas: continue revalidation

Discovery ends too early in some configurations, this could be -stable material.

> [PATCH 14/18] libata: export ata_port suspend/resume infrastructure for sas

This plus patch 17 and a small enabling patch for isci and we have our
first tested libsas suspend / resume implementation in the kernel.
Needs an ack from Jeff.

> [PATCH 15/18] libsas: drop sata port multiplier infrastructure

Minor cleanup.

> [PATCH 16/18] scsi, sd: limit the scope of the async probe domain

Yea! 1 for 18 going upstream!  Sorry, couldn't resist :-).

> [PATCH 17/18] libsas: suspend / resume support

The aforementioned suspend / resume support.  I believe pm8001 is
ready to take advantage of this because their current hack is not
reliably working.

> [PATCH 18/18] scsi: cleanup setting task state in scsi_error_handler()

Just a cleanup, whenever it goes in is fine.

--
Dan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 00/18] libsas, sas_ata: update for 3.5
  2012-05-31 18:12 ` [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
@ 2012-06-01  4:50   ` Jack Wang
  0 siblings, 0 replies; 21+ messages in thread
From: Jack Wang @ 2012-06-01  4:50 UTC (permalink / raw)
  To: 'Dan Williams', 'linux-scsi',
	JBottomley, 'Jeff Garzik'
  Cc: linux-ide

> > [PATCH 17/18] libsas: suspend / resume support
> 
> The aforementioned suspend / resume support.  I believe pm8001 is
> ready to take advantage of this because their current hack is not
> reliably working.
> 
[Jack Wang] Yes, this really help to fix suspend/resume support in pm8001,
I'll be glad to see this patch and all this patch set in, just my 0.007
cent. 


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-06-01  4:50 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-06 18:17 [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
2012-05-06 18:18 ` [PATCH 01/18] libsas: cleanup spurious calls to scsi_schedule_eh Dan Williams
2012-05-06 18:18 ` [PATCH 02/18] libata, libsas: introduce sched_eh and end_eh port ops Dan Williams
2012-05-06 18:18 ` [PATCH 03/18] scsi: fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations) Dan Williams
2012-05-06 18:18 ` [PATCH 04/18] scsi_transport_sas: fix delete vs scan race Dan Williams
2012-05-06 18:18 ` [PATCH 05/18] libsas: enforce eh strategy handlers only in eh context Dan Williams
2012-05-06 18:18 ` [PATCH 06/18] libsas: add sas_eh_abort_handler Dan Williams
2012-05-06 18:18 ` [PATCH 07/18] libsas: use ->lldd_I_T_nexus_reset for ->eh_bus_reset_handler Dan Williams
2012-05-06 18:18 ` [PATCH 08/18] isci: use sas eh strategy handlers Dan Williams
2012-05-06 18:18 ` [PATCH 09/18] libsas: trim sas_task of slow path infrastructure Dan Williams
2012-05-06 18:18 ` [PATCH 10/18] libsas: sas_rediscover_dev did not look at the SMP exec status Dan Williams
2012-05-06 18:18 ` [PATCH 11/18] mvsas: remove unused variable in mvs_task_exec() Dan Williams
2012-05-06 18:18 ` [PATCH 12/18] libata: reset once Dan Williams
2012-05-06 18:19 ` [PATCH 13/18] libsas: continue revalidation Dan Williams
2012-05-06 18:19 ` [PATCH 14/18] libata: export ata_port suspend/resume infrastructure for sas Dan Williams
2012-05-06 18:19 ` [PATCH 15/18] libsas: drop sata port multiplier infrastructure Dan Williams
2012-05-06 18:19 ` [PATCH 16/18] scsi, sd: limit the scope of the async probe domain Dan Williams
2012-05-06 18:19 ` [PATCH 17/18] libsas: suspend / resume support Dan Williams
2012-05-06 18:19 ` [PATCH 18/18] scsi: cleanup setting task state in scsi_error_handler() Dan Williams
2012-05-31 18:12 ` [PATCH 00/18] libsas, sas_ata: update for 3.5 Dan Williams
2012-06-01  4:50   ` Jack Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.