All of lore.kernel.org
 help / color / mirror / Atom feed
* [libsas PATCH v10 0/9] libsas error handling + discovery v10
@ 2012-03-11  4:38 Dan Williams
  2012-03-11  4:39 ` [libsas PATCH v10 1/9] libsas: introduce sas_work to fix sas_drain_work vs sas_queue_work Dan Williams
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Dan Williams @ 2012-03-11  4:38 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

Changes since v9: http://marc.info/?l=linux-scsi&m=132890988918965&w=2

1/ PATCH 1: fix drain_workqueue implementation.  v9 was corrupting the
   ->entry list_head of in-flight works when it attempted to place them on
   the defer_q to wait for the last drain to complete.  'sas_work' provides
   a local list_head for this purpose.

2/ PATCH 2,3: cleanup host_eh_scheduled handling conflicts between
   libata and libsas.

3/ dropped "libsas: libsas.force_hard_reset module parameter"

4/ PATCH 4: replaced "libsas: enforce eh strategy handlers only in eh
   context" with a new version that achieves the same goal (prevent the
   ->eh_{bus|device}_reset handlers from triggering resets outside of eh
   context) while still allowing sg_reset.

5/ PATCH 6: fixed up "libsas: use ->lldd_I_T_nexus_reset for
   ->eh_bus_reset_handler" due to internal feedback from Jacek.

6/ PATCH 7: fixed up "libsas: trim sas_task of slow path infrastructure"
   due to feedback from Jack.

7/ PATCH 8,9: two new fixes from Jeff and Tom: "libsas: fix
   sas_find_bcast_phy() in the presence of 'vacant' phys" and "libsas:
   sas_rediscover_dev did not look at the SMP exec status."

8/ Picked up [SCSI] mvsas: remove unused variable in mvs_task_exec()
   (http://marc.info/?l=linux-scsi&m=133119166619140&w=2)

[libsas PATCH v10 1/9] libsas: introduce sas_work to fix sas_drain_work vs sas_queue_work
[libsas PATCH v10 2/9] libsas: cleanup spurious calls to scsi_schedule_eh
[libsas PATCH v10 3/9] libata, libsas: introduce sched_eh and end_eh port ops
[libsas PATCH v10 4/9] libsas: enforce eh strategy handlers only in eh context
[libsas PATCH v10 5/9] libsas: add sas_eh_abort_handler
[libsas PATCH v10 6/9] libsas: use ->lldd_I_T_nexus_reset for ->eh_bus_reset_handler
[libsas PATCH v10 7/9] libsas: trim sas_task of slow path infrastructure
[libsas PATCH v10 8/9] libsas: fix sas_find_bcast_phy() in the presence of 'vacant' phys
[libsas PATCH v10 9/9] libsas: sas_rediscover_dev did not look at the SMP exec status.

diffstat since v9:
 Documentation/kernel-parameters.txt |    6 --
 drivers/ata/libata-core.c           |    4 +
 drivers/ata/libata-eh.c             |   57 +++++++++++---
 drivers/scsi/libsas/sas_ata.c       |   63 ++++++++-------
 drivers/scsi/libsas/sas_discover.c  |   45 +++++++----
 drivers/scsi/libsas/sas_event.c     |   36 +++++----
 drivers/scsi/libsas/sas_expander.c  |   24 +++++--
 drivers/scsi/libsas/sas_init.c      |   33 ++++-----
 drivers/scsi/libsas/sas_internal.h  |    6 +-
 drivers/scsi/libsas/sas_phy.c       |   21 ++----
 drivers/scsi/libsas/sas_port.c      |   15 +---
 drivers/scsi/libsas/sas_scsi_host.c |  142 +++++++++++++++++++++++++++++++---
 drivers/scsi/mvsas/mv_sas.c         |    1 -
 drivers/scsi/pm8001/pm8001_sas.c    |    7 +-
 include/linux/libata.h              |    4 +
 include/scsi/libsas.h               |   54 ++++++++++++-
 include/scsi/sas_ata.h              |    9 +-
 17 files changed, 371 insertions(+), 156 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [libsas PATCH v10 1/9] libsas: introduce sas_work to fix sas_drain_work vs sas_queue_work
  2012-03-11  4:38 [libsas PATCH v10 0/9] libsas error handling + discovery v10 Dan Williams
@ 2012-03-11  4:39 ` Dan Williams
  2012-03-11  4:39 ` [libsas PATCH v10 2/9] libsas: cleanup spurious calls to scsi_schedule_eh Dan Williams
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Dan Williams @ 2012-03-11  4:39 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

When requeuing work to a draining workqueue the last work instance may
not be idle, so sas_queue_work() must not touch work->entry.  Introduce
sas_work with a drain_node list_head to have a private list for
collecting work deferred due to drain collision.

Fixes reports like:
  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: [<ffffffff810410d4>] process_one_work+0x2e/0x338

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_discover.c |   28 +++++++++++++------------
 drivers/scsi/libsas/sas_event.c    |   24 ++++++++++++----------
 drivers/scsi/libsas/sas_init.c     |   11 +++++-----
 drivers/scsi/libsas/sas_internal.h |    6 +++--
 drivers/scsi/libsas/sas_phy.c      |   21 ++++++-------------
 drivers/scsi/libsas/sas_port.c     |   15 +++++---------
 include/scsi/libsas.h              |   40 ++++++++++++++++++++++++++++++++----
 7 files changed, 83 insertions(+), 62 deletions(-)

diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index 7a3ae48..bd8c636 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -205,8 +205,7 @@ void sas_notify_lldd_dev_gone(struct domain_device *dev)
 static void sas_probe_devices(struct work_struct *work)
 {
 	struct domain_device *dev, *n;
-	struct sas_discovery_event *ev =
-		container_of(work, struct sas_discovery_event, work);
+	struct sas_discovery_event *ev = to_sas_discovery_event(work);
 	struct asd_sas_port *port = ev->port;
 
 	clear_bit(DISCE_PROBE, &port->disc.pending);
@@ -291,8 +290,7 @@ static void sas_unregister_common_dev(struct asd_sas_port *port, struct domain_d
 static void sas_destruct_devices(struct work_struct *work)
 {
 	struct domain_device *dev, *n;
-	struct sas_discovery_event *ev =
-		container_of(work, struct sas_discovery_event, work);
+	struct sas_discovery_event *ev = to_sas_discovery_event(work);
 	struct asd_sas_port *port = ev->port;
 
 	clear_bit(DISCE_DESTRUCT, &port->disc.pending);
@@ -377,8 +375,7 @@ static void sas_discover_domain(struct work_struct *work)
 {
 	struct domain_device *dev;
 	int error = 0;
-	struct sas_discovery_event *ev =
-		container_of(work, struct sas_discovery_event, work);
+	struct sas_discovery_event *ev = to_sas_discovery_event(work);
 	struct asd_sas_port *port = ev->port;
 
 	clear_bit(DISCE_DISCOVER_DOMAIN, &port->disc.pending);
@@ -437,8 +434,7 @@ static void sas_discover_domain(struct work_struct *work)
 static void sas_revalidate_domain(struct work_struct *work)
 {
 	int res = 0;
-	struct sas_discovery_event *ev =
-		container_of(work, struct sas_discovery_event, work);
+	struct sas_discovery_event *ev = to_sas_discovery_event(work);
 	struct asd_sas_port *port = ev->port;
 	struct sas_ha_struct *ha = port->ha;
 
@@ -466,21 +462,25 @@ static void sas_revalidate_domain(struct work_struct *work)
 
 /* ---------- Events ---------- */
 
-static void sas_chain_work(struct sas_ha_struct *ha, struct work_struct *work)
+static void sas_chain_work(struct sas_ha_struct *ha, struct sas_work *sw)
 {
-	/* chained work is not subject to SA_HA_DRAINING or SAS_HA_REGISTERED */
-	scsi_queue_work(ha->core.shost, work);
+	/* chained work is not subject to SA_HA_DRAINING or
+	 * SAS_HA_REGISTERED, because it is either submitted in the
+	 * workqueue, or known to be submitted from a context that is
+	 * not racing against draining
+	 */
+	scsi_queue_work(ha->core.shost, &sw->work);
 }
 
 static void sas_chain_event(int event, unsigned long *pending,
-			    struct work_struct *work,
+			    struct sas_work *sw,
 			    struct sas_ha_struct *ha)
 {
 	if (!test_and_set_bit(event, pending)) {
 		unsigned long flags;
 
 		spin_lock_irqsave(&ha->state_lock, flags);
-		sas_chain_work(ha, work);
+		sas_chain_work(ha, sw);
 		spin_unlock_irqrestore(&ha->state_lock, flags);
 	}
 }
@@ -519,7 +519,7 @@ void sas_init_disc(struct sas_discovery *disc, struct asd_sas_port *port)
 
 	disc->pending = 0;
 	for (i = 0; i < DISC_NUM_EVENTS; i++) {
-		INIT_WORK(&disc->disc_work[i].work, sas_event_fns[i]);
+		INIT_SAS_WORK(&disc->disc_work[i].work, sas_event_fns[i]);
 		disc->disc_work[i].port = port;
 	}
 }
diff --git a/drivers/scsi/libsas/sas_event.c b/drivers/scsi/libsas/sas_event.c
index 16639bb..4e4292d 100644
--- a/drivers/scsi/libsas/sas_event.c
+++ b/drivers/scsi/libsas/sas_event.c
@@ -27,19 +27,21 @@
 #include "sas_internal.h"
 #include "sas_dump.h"
 
-void sas_queue_work(struct sas_ha_struct *ha, struct work_struct *work)
+void sas_queue_work(struct sas_ha_struct *ha, struct sas_work *sw)
 {
 	if (!test_bit(SAS_HA_REGISTERED, &ha->state))
 		return;
 
-	if (test_bit(SAS_HA_DRAINING, &ha->state))
-		list_add(&work->entry, &ha->defer_q);
-	else
-		scsi_queue_work(ha->core.shost, work);
+	if (test_bit(SAS_HA_DRAINING, &ha->state)) {
+		/* add it to the defer list, if not already pending */
+		if (list_empty(&sw->drain_node))
+			list_add(&sw->drain_node, &ha->defer_q);
+	} else
+		scsi_queue_work(ha->core.shost, &sw->work);
 }
 
 static void sas_queue_event(int event, unsigned long *pending,
-			    struct work_struct *work,
+			    struct sas_work *work,
 			    struct sas_ha_struct *ha)
 {
 	if (!test_and_set_bit(event, pending)) {
@@ -55,7 +57,7 @@ static void sas_queue_event(int event, unsigned long *pending,
 void __sas_drain_work(struct sas_ha_struct *ha)
 {
 	struct workqueue_struct *wq = ha->core.shost->work_q;
-	struct work_struct *w, *_w;
+	struct sas_work *sw, *_sw;
 
 	set_bit(SAS_HA_DRAINING, &ha->state);
 	/* flush submitters */
@@ -66,9 +68,9 @@ void __sas_drain_work(struct sas_ha_struct *ha)
 
 	spin_lock_irq(&ha->state_lock);
 	clear_bit(SAS_HA_DRAINING, &ha->state);
-	list_for_each_entry_safe(w, _w, &ha->defer_q, entry) {
-		list_del_init(&w->entry);
-		sas_queue_work(ha, w);
+	list_for_each_entry_safe(sw, _sw, &ha->defer_q, drain_node) {
+		list_del_init(&sw->drain_node);
+		sas_queue_work(ha, sw);
 	}
 	spin_unlock_irq(&ha->state_lock);
 }
@@ -151,7 +153,7 @@ int sas_init_events(struct sas_ha_struct *sas_ha)
 	int i;
 
 	for (i = 0; i < HA_NUM_EVENTS; i++) {
-		INIT_WORK(&sas_ha->ha_events[i].work, sas_ha_event_fns[i]);
+		INIT_SAS_WORK(&sas_ha->ha_events[i].work, sas_ha_event_fns[i]);
 		sas_ha->ha_events[i].ha = sas_ha;
 	}
 
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index 120bff6..10cb5ae 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -94,8 +94,7 @@ void sas_hash_addr(u8 *hashed, const u8 *sas_addr)
 
 void sas_hae_reset(struct work_struct *work)
 {
-	struct sas_ha_event *ev =
-		container_of(work, struct sas_ha_event, work);
+	struct sas_ha_event *ev = to_sas_ha_event(work);
 	struct sas_ha_struct *ha = ev->ha;
 
 	clear_bit(HAE_RESET, &ha->pending);
@@ -369,14 +368,14 @@ static void sas_phy_release(struct sas_phy *phy)
 
 static void phy_reset_work(struct work_struct *work)
 {
-	struct sas_phy_data *d = container_of(work, typeof(*d), reset_work);
+	struct sas_phy_data *d = container_of(work, typeof(*d), reset_work.work);
 
 	d->reset_result = transport_sas_phy_reset(d->phy, d->hard_reset);
 }
 
 static void phy_enable_work(struct work_struct *work)
 {
-	struct sas_phy_data *d = container_of(work, typeof(*d), enable_work);
+	struct sas_phy_data *d = container_of(work, typeof(*d), enable_work.work);
 
 	d->enable_result = sas_phy_enable(d->phy, d->enable);
 }
@@ -389,8 +388,8 @@ static int sas_phy_setup(struct sas_phy *phy)
 		return -ENOMEM;
 
 	mutex_init(&d->event_lock);
-	INIT_WORK(&d->reset_work, phy_reset_work);
-	INIT_WORK(&d->enable_work, phy_enable_work);
+	INIT_SAS_WORK(&d->reset_work, phy_reset_work);
+	INIT_SAS_WORK(&d->enable_work, phy_enable_work);
 	d->phy = phy;
 	phy->hostdata = d;
 
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index f05c638..507e4cf 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -45,10 +45,10 @@ struct sas_phy_data {
 	struct mutex event_lock;
 	int hard_reset;
 	int reset_result;
-	struct work_struct reset_work;
+	struct sas_work reset_work;
 	int enable;
 	int enable_result;
-	struct work_struct enable_work;
+	struct sas_work enable_work;
 };
 
 void sas_scsi_recover_host(struct Scsi_Host *shost);
@@ -80,7 +80,7 @@ void sas_porte_broadcast_rcvd(struct work_struct *work);
 void sas_porte_link_reset_err(struct work_struct *work);
 void sas_porte_timer_event(struct work_struct *work);
 void sas_porte_hard_reset(struct work_struct *work);
-void sas_queue_work(struct sas_ha_struct *ha, struct work_struct *work);
+void sas_queue_work(struct sas_ha_struct *ha, struct sas_work *sw);
 
 int sas_notify_lldd_dev_found(struct domain_device *);
 void sas_notify_lldd_dev_gone(struct domain_device *);
diff --git a/drivers/scsi/libsas/sas_phy.c b/drivers/scsi/libsas/sas_phy.c
index dcfd4a9..521422e 100644
--- a/drivers/scsi/libsas/sas_phy.c
+++ b/drivers/scsi/libsas/sas_phy.c
@@ -32,8 +32,7 @@
 
 static void sas_phye_loss_of_signal(struct work_struct *work)
 {
-	struct asd_sas_event *ev =
-		container_of(work, struct asd_sas_event, work);
+	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
 	clear_bit(PHYE_LOSS_OF_SIGNAL, &phy->phy_events_pending);
@@ -43,8 +42,7 @@ static void sas_phye_loss_of_signal(struct work_struct *work)
 
 static void sas_phye_oob_done(struct work_struct *work)
 {
-	struct asd_sas_event *ev =
-		container_of(work, struct asd_sas_event, work);
+	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
 	clear_bit(PHYE_OOB_DONE, &phy->phy_events_pending);
@@ -53,8 +51,7 @@ static void sas_phye_oob_done(struct work_struct *work)
 
 static void sas_phye_oob_error(struct work_struct *work)
 {
-	struct asd_sas_event *ev =
-		container_of(work, struct asd_sas_event, work);
+	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 	struct sas_ha_struct *sas_ha = phy->ha;
 	struct asd_sas_port *port = phy->port;
@@ -85,8 +82,7 @@ static void sas_phye_oob_error(struct work_struct *work)
 
 static void sas_phye_spinup_hold(struct work_struct *work)
 {
-	struct asd_sas_event *ev =
-		container_of(work, struct asd_sas_event, work);
+	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 	struct sas_ha_struct *sas_ha = phy->ha;
 	struct sas_internal *i =
@@ -127,14 +123,12 @@ int sas_register_phys(struct sas_ha_struct *sas_ha)
 		phy->error = 0;
 		INIT_LIST_HEAD(&phy->port_phy_el);
 		for (k = 0; k < PORT_NUM_EVENTS; k++) {
-			INIT_WORK(&phy->port_events[k].work,
-				  sas_port_event_fns[k]);
+			INIT_SAS_WORK(&phy->port_events[k].work, sas_port_event_fns[k]);
 			phy->port_events[k].phy = phy;
 		}
 
 		for (k = 0; k < PHY_NUM_EVENTS; k++) {
-			INIT_WORK(&phy->phy_events[k].work,
-				  sas_phy_event_fns[k]);
+			INIT_SAS_WORK(&phy->phy_events[k].work, sas_phy_event_fns[k]);
 			phy->phy_events[k].phy = phy;
 		}
 
@@ -144,8 +138,7 @@ int sas_register_phys(struct sas_ha_struct *sas_ha)
 		spin_lock_init(&phy->sas_prim_lock);
 		phy->frame_rcvd_size = 0;
 
-		phy->phy = sas_phy_alloc(&sas_ha->core.shost->shost_gendev,
-					 i);
+		phy->phy = sas_phy_alloc(&sas_ha->core.shost->shost_gendev, i);
 		if (!phy->phy)
 			return -ENOMEM;
 
diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
index eb19c01..1cf7d75 100644
--- a/drivers/scsi/libsas/sas_port.c
+++ b/drivers/scsi/libsas/sas_port.c
@@ -208,8 +208,7 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
 
 void sas_porte_bytes_dmaed(struct work_struct *work)
 {
-	struct asd_sas_event *ev =
-		container_of(work, struct asd_sas_event, work);
+	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
 	clear_bit(PORTE_BYTES_DMAED, &phy->port_events_pending);
@@ -219,8 +218,7 @@ void sas_porte_bytes_dmaed(struct work_struct *work)
 
 void sas_porte_broadcast_rcvd(struct work_struct *work)
 {
-	struct asd_sas_event *ev =
-		container_of(work, struct asd_sas_event, work);
+	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 	unsigned long flags;
 	u32 prim;
@@ -237,8 +235,7 @@ void sas_porte_broadcast_rcvd(struct work_struct *work)
 
 void sas_porte_link_reset_err(struct work_struct *work)
 {
-	struct asd_sas_event *ev =
-		container_of(work, struct asd_sas_event, work);
+	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
 	clear_bit(PORTE_LINK_RESET_ERR, &phy->port_events_pending);
@@ -248,8 +245,7 @@ void sas_porte_link_reset_err(struct work_struct *work)
 
 void sas_porte_timer_event(struct work_struct *work)
 {
-	struct asd_sas_event *ev =
-		container_of(work, struct asd_sas_event, work);
+	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
 	clear_bit(PORTE_TIMER_EVENT, &phy->port_events_pending);
@@ -259,8 +255,7 @@ void sas_porte_timer_event(struct work_struct *work)
 
 void sas_porte_hard_reset(struct work_struct *work)
 {
-	struct asd_sas_event *ev =
-		container_of(work, struct asd_sas_event, work);
+	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
 	clear_bit(PORTE_HARD_RESET, &phy->port_events_pending);
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 5f5ed1b..f4f1c96 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -217,11 +217,29 @@ struct domain_device {
 	struct kref kref;
 };
 
-struct sas_discovery_event {
+struct sas_work {
+	struct list_head drain_node;
 	struct work_struct work;
+};
+
+static inline void INIT_SAS_WORK(struct sas_work *sw, void (*fn)(struct work_struct *))
+{
+	INIT_WORK(&sw->work, fn);
+	INIT_LIST_HEAD(&sw->drain_node);
+}
+
+struct sas_discovery_event {
+	struct sas_work work;
 	struct asd_sas_port *port;
 };
 
+static inline struct sas_discovery_event *to_sas_discovery_event(struct work_struct *work)
+{
+	struct sas_discovery_event *ev = container_of(work, typeof(*ev), work.work);
+
+	return ev;
+}
+
 struct sas_discovery {
 	struct sas_discovery_event disc_work[DISC_NUM_EVENTS];
 	unsigned long    pending;
@@ -244,7 +262,7 @@ struct asd_sas_port {
 	struct list_head destroy_list;
 	enum   sas_linkrate linkrate;
 
-	struct work_struct work;
+	struct sas_work work;
 
 /* public: */
 	int id;
@@ -270,10 +288,17 @@ struct asd_sas_port {
 };
 
 struct asd_sas_event {
-	struct work_struct work;
+	struct sas_work work;
 	struct asd_sas_phy *phy;
 };
 
+static inline struct asd_sas_event *to_asd_sas_event(struct work_struct *work)
+{
+	struct asd_sas_event *ev = container_of(work, typeof(*ev), work.work);
+
+	return ev;
+}
+
 /* The phy pretty much is controlled by the LLDD.
  * The class only reads those fields.
  */
@@ -333,10 +358,17 @@ struct scsi_core {
 };
 
 struct sas_ha_event {
-	struct work_struct work;
+	struct sas_work work;
 	struct sas_ha_struct *ha;
 };
 
+static inline struct sas_ha_event *to_sas_ha_event(struct work_struct *work)
+{
+	struct sas_ha_event *ev = container_of(work, typeof(*ev), work.work);
+
+	return ev;
+}
+
 enum sas_ha_state {
 	SAS_HA_REGISTERED,
 	SAS_HA_DRAINING,


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [libsas PATCH v10 2/9] libsas: cleanup spurious calls to scsi_schedule_eh
  2012-03-11  4:38 [libsas PATCH v10 0/9] libsas error handling + discovery v10 Dan Williams
  2012-03-11  4:39 ` [libsas PATCH v10 1/9] libsas: introduce sas_work to fix sas_drain_work vs sas_queue_work Dan Williams
@ 2012-03-11  4:39 ` Dan Williams
  2012-03-16 15:25   ` Dan Williams
  2012-03-11  4:39 ` [libsas PATCH v10 3/9] libata, libsas: introduce sched_eh and end_eh port ops Dan Williams
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Dan Williams @ 2012-03-11  4:39 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

eh is woken up automatically by the presence of failed commands,
scsi_schedule_eh is reserved for cases where there are no failed
commands.  This guarantees that host_eh_sceduled is only incremented
when an explicit eh request is made.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_ata.c       |   25 -------------------------
 drivers/scsi/libsas/sas_scsi_host.c |    5 -----
 include/scsi/sas_ata.h              |    4 ----
 3 files changed, 0 insertions(+), 34 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index bc0cecc..729a7b6 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -574,31 +574,6 @@ int sas_ata_init_host_and_port(struct domain_device *found_dev)
 	return 0;
 }
 
-void sas_ata_task_abort(struct sas_task *task)
-{
-	struct ata_queued_cmd *qc = task->uldd_task;
-	struct completion *waiting;
-
-	/* Bounce SCSI-initiated commands to the SCSI EH */
-	if (qc->scsicmd) {
-		struct request_queue *q = qc->scsicmd->device->request_queue;
-		unsigned long flags;
-
-		spin_lock_irqsave(q->queue_lock, flags);
-		blk_abort_request(qc->scsicmd->request);
-		spin_unlock_irqrestore(q->queue_lock, flags);
-		scsi_schedule_eh(qc->scsicmd->device->host);
-		return;
-	}
-
-	/* Internal command, fake a timeout and complete. */
-	qc->flags &= ~ATA_QCFLAG_ACTIVE;
-	qc->flags |= ATA_QCFLAG_FAILED;
-	qc->err_mask |= AC_ERR_TIMEOUT;
-	waiting = qc->private_data;
-	complete(waiting);
-}
-
 static void sas_get_ata_command_set(struct domain_device *dev)
 {
 	struct dev_to_host_fis *fis =
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index f0b9b7b..49a9113 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -992,10 +992,6 @@ void sas_task_abort(struct sas_task *task)
 			return;
 		task->timer.function(task->timer.data);
 		return;
-	}
-
-	if (dev_is_sata(task->dev)) {
-		sas_ata_task_abort(task);
 	} else {
 		struct request_queue *q = sc->device->request_queue;
 		unsigned long flags;
@@ -1003,7 +999,6 @@ void sas_task_abort(struct sas_task *task)
 		spin_lock_irqsave(q->queue_lock, flags);
 		blk_abort_request(sc->request);
 		spin_unlock_irqrestore(q->queue_lock, flags);
-		scsi_schedule_eh(sc->device->host);
 	}
 }
 
diff --git a/include/scsi/sas_ata.h b/include/scsi/sas_ata.h
index cdccd2e..6d5d60c 100644
--- a/include/scsi/sas_ata.h
+++ b/include/scsi/sas_ata.h
@@ -38,7 +38,6 @@ static inline int dev_is_sata(struct domain_device *dev)
 
 int sas_get_ata_info(struct domain_device *dev, struct ex_phy *phy);
 int sas_ata_init_host_and_port(struct domain_device *found_dev);
-void sas_ata_task_abort(struct sas_task *task);
 void sas_ata_strategy_handler(struct Scsi_Host *shost);
 void sas_ata_eh(struct Scsi_Host *shost, struct list_head *work_q,
 		struct list_head *done_q);
@@ -56,9 +55,6 @@ static inline int sas_ata_init_host_and_port(struct domain_device *found_dev)
 {
 	return 0;
 }
-static inline void sas_ata_task_abort(struct sas_task *task)
-{
-}
 
 static inline void sas_ata_strategy_handler(struct Scsi_Host *shost)
 {


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [libsas PATCH v10 3/9] libata, libsas: introduce sched_eh and end_eh port ops
  2012-03-11  4:38 [libsas PATCH v10 0/9] libsas error handling + discovery v10 Dan Williams
  2012-03-11  4:39 ` [libsas PATCH v10 1/9] libsas: introduce sas_work to fix sas_drain_work vs sas_queue_work Dan Williams
  2012-03-11  4:39 ` [libsas PATCH v10 2/9] libsas: cleanup spurious calls to scsi_schedule_eh Dan Williams
@ 2012-03-11  4:39 ` Dan Williams
  2012-04-11  2:13   ` Dan Williams
  2012-03-11  4:39 ` [libsas PATCH v10 4/9] libsas: enforce eh strategy handlers only in eh context Dan Williams
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Dan Williams @ 2012-03-11  4:39 UTC (permalink / raw)
  To: linux-scsi; +Cc: Tejun Heo, linux-ide, Jacek Danecki

When managing shost->host_eh_scheduled libata assumes that there is a
1:1 shost-to-ata_port relationship.  libsas creates a 1:N relationship
so it needs to manage host_eh_scheduled cumulatively at the host level.
The sched_eh and end_eh port port ops allow libsas to track when domain
devices enter/leave the "eh-pending" state under ha->lock (previously
named ha->state_lock, but it is no longer just a lock for ha->state
changes).

Since host_eh_scheduled indicates eh without backing commands pinning
the device it can be deallocated at any time.  Move the taking of the
domain_device reference under the port_lock to guarantee that the
ata_port stays around for the duration of eh.

Cc: Tejun Heo <tj@kernel.org>
Acked-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/ata/libata-core.c           |    4 ++
 drivers/ata/libata-eh.c             |   57 ++++++++++++++++++++++++++++-------
 drivers/scsi/libsas/sas_ata.c       |   38 +++++++++++++++++++++--
 drivers/scsi/libsas/sas_discover.c  |    6 ++--
 drivers/scsi/libsas/sas_event.c     |   12 ++++---
 drivers/scsi/libsas/sas_init.c      |   14 ++++-----
 drivers/scsi/libsas/sas_scsi_host.c |   27 +++++++++++++----
 include/linux/libata.h              |    4 ++
 include/scsi/libsas.h               |    4 ++
 include/scsi/sas_ata.h              |    5 +++
 10 files changed, 134 insertions(+), 37 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 1654c94..827881e 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -79,6 +79,8 @@ const struct ata_port_operations ata_base_port_ops = {
 	.prereset		= ata_std_prereset,
 	.postreset		= ata_std_postreset,
 	.error_handler		= ata_std_error_handler,
+	.sched_eh		= ata_std_sched_eh,
+	.end_eh			= ata_std_end_eh,
 };
 
 const struct ata_port_operations sata_port_ops = {
@@ -6573,6 +6575,8 @@ struct ata_port_operations ata_dummy_port_ops = {
 	.qc_prep		= ata_noop_qc_prep,
 	.qc_issue		= ata_dummy_qc_issue,
 	.error_handler		= ata_dummy_error_handler,
+	.sched_eh		= ata_std_sched_eh,
+	.end_eh			= ata_std_end_eh,
 };
 
 const struct ata_port_info ata_dummy_port_info = {
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index c61316e..4f12f63 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -793,12 +793,12 @@ void ata_scsi_port_error_handler(struct Scsi_Host *host, struct ata_port *ap)
 		ata_for_each_link(link, ap, HOST_FIRST)
 			memset(&link->eh_info, 0, sizeof(link->eh_info));
 
-		/* Clear host_eh_scheduled while holding ap->lock such
-		 * that if exception occurs after this point but
-		 * before EH completion, SCSI midlayer will
+		/* end eh (clear host_eh_scheduled) while holding
+		 * ap->lock such that if exception occurs after this
+		 * point but before EH completion, SCSI midlayer will
 		 * re-initiate EH.
 		 */
-		host->host_eh_scheduled = 0;
+		ap->ops->end_eh(ap);
 
 		spin_unlock_irqrestore(ap->lock, flags);
 		ata_eh_release(ap);
@@ -986,16 +986,13 @@ void ata_qc_schedule_eh(struct ata_queued_cmd *qc)
 }
 
 /**
- *	ata_port_schedule_eh - schedule error handling without a qc
- *	@ap: ATA port to schedule EH for
- *
- *	Schedule error handling for @ap.  EH will kick in as soon as
- *	all commands are drained.
+ * ata_std_sched_eh - non-libsas ata_ports issue eh with this common routine
+ * @ap: ATA port to schedule EH for
  *
- *	LOCKING:
+ *	LOCKING: inherited from ata_port_schedule_eh
  *	spin_lock_irqsave(host lock)
  */
-void ata_port_schedule_eh(struct ata_port *ap)
+void ata_std_sched_eh(struct ata_port *ap)
 {
 	WARN_ON(!ap->ops->error_handler);
 
@@ -1007,6 +1004,44 @@ void ata_port_schedule_eh(struct ata_port *ap)
 
 	DPRINTK("port EH scheduled\n");
 }
+EXPORT_SYMBOL_GPL(ata_std_sched_eh);
+
+/**
+ * ata_std_end_eh - non-libsas ata_ports complete eh with this common routine
+ * @ap: ATA port to end EH for
+ *
+ * In the libata object model there is a 1:1 mapping of ata_port to
+ * shost, so host fields can be directly manipulated under ap->lock, in
+ * the libsas case we need to hold a lock at the ha->level to coordinate
+ * these events.
+ *
+ *	LOCKING:
+ *	spin_lock_irqsave(host lock)
+ */
+void ata_std_end_eh(struct ata_port *ap)
+{
+	struct Scsi_Host *host = ap->scsi_host;
+
+	host->host_eh_scheduled = 0;
+}
+EXPORT_SYMBOL(ata_std_end_eh);
+
+
+/**
+ *	ata_port_schedule_eh - schedule error handling without a qc
+ *	@ap: ATA port to schedule EH for
+ *
+ *	Schedule error handling for @ap.  EH will kick in as soon as
+ *	all commands are drained.
+ *
+ *	LOCKING:
+ *	spin_lock_irqsave(host lock)
+ */
+void ata_port_schedule_eh(struct ata_port *ap)
+{
+	/* see: ata_std_sched_eh, unless you know better */
+	ap->ops->sched_eh(ap);
+}
 
 static int ata_do_link_abort(struct ata_port *ap, struct ata_link *link)
 {
diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 729a7b6..d30a093 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -523,6 +523,31 @@ static void sas_ata_set_dmamode(struct ata_port *ap, struct ata_device *ata_dev)
 		i->dft->lldd_ata_set_dmamode(dev);
 }
 
+static void sas_ata_sched_eh(struct ata_port *ap)
+{
+	struct domain_device *dev = ap->private_data;
+	struct sas_ha_struct *ha = dev->port->ha;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ha->lock, flags);
+	if (!test_and_set_bit(SAS_DEV_EH_PENDING, &dev->state))
+		ha->eh_active++;
+	ata_std_sched_eh(ap);
+	spin_unlock_irqrestore(&ha->lock, flags);
+}
+
+void sas_ata_end_eh(struct ata_port *ap)
+{
+	struct domain_device *dev = ap->private_data;
+	struct sas_ha_struct *ha = dev->port->ha;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ha->lock, flags);
+	if (test_and_clear_bit(SAS_DEV_EH_PENDING, &dev->state))
+		ha->eh_active--;
+	spin_unlock_irqrestore(&ha->lock, flags);
+}
+
 static struct ata_port_operations sas_sata_ops = {
 	.prereset		= ata_std_prereset,
 	.hardreset		= sas_ata_hard_reset,
@@ -536,6 +561,8 @@ static struct ata_port_operations sas_sata_ops = {
 	.port_start		= ata_sas_port_start,
 	.port_stop		= ata_sas_port_stop,
 	.set_dmamode		= sas_ata_set_dmamode,
+	.sched_eh		= sas_ata_sched_eh,
+	.end_eh			= sas_ata_end_eh,
 };
 
 static struct ata_port_info sata_port_info = {
@@ -684,10 +711,6 @@ static void async_sas_ata_eh(void *data, async_cookie_t cookie)
 	struct ata_port *ap = dev->sata_dev.ap;
 	struct sas_ha_struct *ha = dev->port->ha;
 
-	/* hold a reference over eh since we may be racing with final
-	 * remove once all commands are completed
-	 */
-	kref_get(&dev->kref);
 	sas_ata_printk(KERN_DEBUG, dev, "dev error handler\n");
 	ata_scsi_port_error_handler(ha->core.shost, ap);
 	sas_put_device(dev);
@@ -730,6 +753,13 @@ void sas_ata_strategy_handler(struct Scsi_Host *shost)
 		list_for_each_entry(dev, &port->dev_list, dev_list_node) {
 			if (!sas_ata_dev_eh_valid(dev))
 				continue;
+
+			/* hold a reference over eh since we may be
+			 * racing with final remove once all commands
+			 * are completed
+			 */
+			kref_get(&dev->kref);
+
 			async_schedule_domain(async_sas_ata_eh, dev, &async);
 		}
 		spin_unlock(&port->dev_list_lock);
diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index bd8c636..a730ac9 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -282,6 +282,8 @@ static void sas_unregister_common_dev(struct asd_sas_port *port, struct domain_d
 
 	spin_lock_irq(&port->dev_list_lock);
 	list_del_init(&dev->dev_list_node);
+	if (dev_is_sata(dev))
+		sas_ata_end_eh(dev->sata_dev.ap);
 	spin_unlock_irq(&port->dev_list_lock);
 
 	sas_put_device(dev);
@@ -479,9 +481,9 @@ static void sas_chain_event(int event, unsigned long *pending,
 	if (!test_and_set_bit(event, pending)) {
 		unsigned long flags;
 
-		spin_lock_irqsave(&ha->state_lock, flags);
+		spin_lock_irqsave(&ha->lock, flags);
 		sas_chain_work(ha, sw);
-		spin_unlock_irqrestore(&ha->state_lock, flags);
+		spin_unlock_irqrestore(&ha->lock, flags);
 	}
 }
 
diff --git a/drivers/scsi/libsas/sas_event.c b/drivers/scsi/libsas/sas_event.c
index 4e4292d..789c4d8 100644
--- a/drivers/scsi/libsas/sas_event.c
+++ b/drivers/scsi/libsas/sas_event.c
@@ -47,9 +47,9 @@ static void sas_queue_event(int event, unsigned long *pending,
 	if (!test_and_set_bit(event, pending)) {
 		unsigned long flags;
 
-		spin_lock_irqsave(&ha->state_lock, flags);
+		spin_lock_irqsave(&ha->lock, flags);
 		sas_queue_work(ha, work);
-		spin_unlock_irqrestore(&ha->state_lock, flags);
+		spin_unlock_irqrestore(&ha->lock, flags);
 	}
 }
 
@@ -61,18 +61,18 @@ void __sas_drain_work(struct sas_ha_struct *ha)
 
 	set_bit(SAS_HA_DRAINING, &ha->state);
 	/* flush submitters */
-	spin_lock_irq(&ha->state_lock);
-	spin_unlock_irq(&ha->state_lock);
+	spin_lock_irq(&ha->lock);
+	spin_unlock_irq(&ha->lock);
 
 	drain_workqueue(wq);
 
-	spin_lock_irq(&ha->state_lock);
+	spin_lock_irq(&ha->lock);
 	clear_bit(SAS_HA_DRAINING, &ha->state);
 	list_for_each_entry_safe(sw, _sw, &ha->defer_q, drain_node) {
 		list_del_init(&sw->drain_node);
 		sas_queue_work(ha, sw);
 	}
-	spin_unlock_irq(&ha->state_lock);
+	spin_unlock_irq(&ha->lock);
 }
 
 int sas_drain_work(struct sas_ha_struct *ha)
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index 10cb5ae..6909fef 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -114,7 +114,7 @@ int sas_register_ha(struct sas_ha_struct *sas_ha)
 		sas_ha->lldd_queue_size = 128; /* Sanity */
 
 	set_bit(SAS_HA_REGISTERED, &sas_ha->state);
-	spin_lock_init(&sas_ha->state_lock);
+	spin_lock_init(&sas_ha->lock);
 	mutex_init(&sas_ha->drain_mutex);
 	INIT_LIST_HEAD(&sas_ha->defer_q);
 
@@ -163,9 +163,9 @@ int sas_unregister_ha(struct sas_ha_struct *sas_ha)
 	 * events to be queued, and flush any in-progress drainers
 	 */
 	mutex_lock(&sas_ha->drain_mutex);
-	spin_lock_irq(&sas_ha->state_lock);
+	spin_lock_irq(&sas_ha->lock);
 	clear_bit(SAS_HA_REGISTERED, &sas_ha->state);
-	spin_unlock_irq(&sas_ha->state_lock);
+	spin_unlock_irq(&sas_ha->lock);
 	__sas_drain_work(sas_ha);
 	mutex_unlock(&sas_ha->drain_mutex);
 
@@ -411,9 +411,9 @@ static int queue_phy_reset(struct sas_phy *phy, int hard_reset)
 	d->reset_result = 0;
 	d->hard_reset = hard_reset;
 
-	spin_lock_irq(&ha->state_lock);
+	spin_lock_irq(&ha->lock);
 	sas_queue_work(ha, &d->reset_work);
-	spin_unlock_irq(&ha->state_lock);
+	spin_unlock_irq(&ha->lock);
 
 	rc = sas_drain_work(ha);
 	if (rc == 0)
@@ -438,9 +438,9 @@ static int queue_phy_enable(struct sas_phy *phy, int enable)
 	d->enable_result = 0;
 	d->enable = enable;
 
-	spin_lock_irq(&ha->state_lock);
+	spin_lock_irq(&ha->lock);
 	sas_queue_work(ha, &d->enable_work);
-	spin_unlock_irq(&ha->state_lock);
+	spin_unlock_irq(&ha->lock);
 
 	rc = sas_drain_work(ha);
 	if (rc == 0)
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 49a9113..cd63d80 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -667,16 +667,20 @@ static void sas_eh_handle_sas_errors(struct Scsi_Host *shost, struct list_head *
 	goto out;
 }
 
+
 void sas_scsi_recover_host(struct Scsi_Host *shost)
 {
 	struct sas_ha_struct *ha = SHOST_TO_SAS_HA(shost);
-	unsigned long flags;
 	LIST_HEAD(eh_work_q);
+	int tries = 0;
+	bool retry;
 
-	spin_lock_irqsave(shost->host_lock, flags);
+retry:
+	tries++;
+	retry = true;
+	spin_lock_irq(shost->host_lock);
 	list_splice_init(&shost->eh_cmd_q, &eh_work_q);
-	shost->host_eh_scheduled = 0;
-	spin_unlock_irqrestore(shost->host_lock, flags);
+	spin_unlock_irq(shost->host_lock);
 
 	SAS_DPRINTK("Enter %s busy: %d failed: %d\n",
 		    __func__, shost->host_busy, shost->host_failed);
@@ -710,8 +714,19 @@ out:
 
 	scsi_eh_flush_done_q(&ha->eh_done_q);
 
-	SAS_DPRINTK("--- Exit %s: busy: %d failed: %d\n",
-		    __func__, shost->host_busy, shost->host_failed);
+	/* check if any new eh work was scheduled during the last run */
+	spin_lock_irq(&ha->lock);
+	if (ha->eh_active == 0) {
+		shost->host_eh_scheduled = 0;
+		retry = false;
+	}
+	spin_unlock_irq(&ha->lock);
+
+	if (retry)
+		goto retry;
+
+	SAS_DPRINTK("--- Exit %s: busy: %d failed: %d tries: %d\n",
+		    __func__, shost->host_busy, shost->host_failed, tries);
 }
 
 enum blk_eh_timer_return sas_scsi_timed_out(struct scsi_cmnd *cmd)
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 42378d6..83ff256 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -845,6 +845,8 @@ struct ata_port_operations {
 	void (*error_handler)(struct ata_port *ap);
 	void (*lost_interrupt)(struct ata_port *ap);
 	void (*post_internal_cmd)(struct ata_queued_cmd *qc);
+	void (*sched_eh)(struct ata_port *ap);
+	void (*end_eh)(struct ata_port *ap);
 
 	/*
 	 * Optional features
@@ -1165,6 +1167,8 @@ extern void ata_do_eh(struct ata_port *ap, ata_prereset_fn_t prereset,
 		      ata_reset_fn_t softreset, ata_reset_fn_t hardreset,
 		      ata_postreset_fn_t postreset);
 extern void ata_std_error_handler(struct ata_port *ap);
+extern void ata_std_sched_eh(struct ata_port *ap);
+extern void ata_std_end_eh(struct ata_port *ap);
 extern int ata_link_nr_enabled(struct ata_link *link);
 
 /*
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index f4f1c96..c0c94b3 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -177,6 +177,7 @@ struct sata_device {
 enum {
 	SAS_DEV_GONE,
 	SAS_DEV_DESTROY,
+	SAS_DEV_EH_PENDING,
 };
 
 struct domain_device {
@@ -384,7 +385,8 @@ struct sas_ha_struct {
 	struct list_head  defer_q; /* work queued while draining */
 	struct mutex	  drain_mutex;
 	unsigned long	  state;
-	spinlock_t 	  state_lock;
+	spinlock_t 	  lock;
+	int		  eh_active;
 
 	struct mutex disco_mutex;
 
diff --git a/include/scsi/sas_ata.h b/include/scsi/sas_ata.h
index 6d5d60c..3e13c28 100644
--- a/include/scsi/sas_ata.h
+++ b/include/scsi/sas_ata.h
@@ -44,6 +44,7 @@ void sas_ata_eh(struct Scsi_Host *shost, struct list_head *work_q,
 void sas_ata_schedule_reset(struct domain_device *dev);
 void sas_ata_wait_eh(struct domain_device *dev);
 void sas_probe_sata(struct asd_sas_port *port);
+void sas_ata_end_eh(struct ata_port *ap);
 #else
 
 
@@ -81,6 +82,10 @@ static inline int sas_get_ata_info(struct domain_device *dev, struct ex_phy *phy
 {
 	return 0;
 }
+
+static inline void sas_ata_end_eh(struct ata_port *ap)
+{
+}
 #endif
 
 #endif /* _SAS_ATA_H_ */


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [libsas PATCH v10 4/9] libsas: enforce eh strategy handlers only in eh context
  2012-03-11  4:38 [libsas PATCH v10 0/9] libsas error handling + discovery v10 Dan Williams
                   ` (2 preceding siblings ...)
  2012-03-11  4:39 ` [libsas PATCH v10 3/9] libata, libsas: introduce sched_eh and end_eh port ops Dan Williams
@ 2012-03-11  4:39 ` Dan Williams
  2012-03-11  4:39 ` [libsas PATCH v10 5/9] libsas: add sas_eh_abort_handler Dan Williams
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Dan Williams @ 2012-03-11  4:39 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

The strategy handlers may be called in places that are problematic for
libsas (i.e. sata resets outside of domain revalidation filtering /
libata link recovery), or problematic for userspace (non-blocking ioctl
to sleeping reset functions).  However, these routines are also called
for eh escalations and recovery of scsi_eh_prep_cmnd(), so permit them
as long as we are running in the host's error handler, otherwise arrange
for them to be triggered in eh_context.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_discover.c  |   11 +++
 drivers/scsi/libsas/sas_init.c      |    2 +
 drivers/scsi/libsas/sas_scsi_host.c |  121 ++++++++++++++++++++++++++++++++++-
 include/scsi/libsas.h               |   10 +++
 4 files changed, 140 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index a730ac9..3c8527d 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -39,6 +39,7 @@ void sas_init_dev(struct domain_device *dev)
 {
         switch (dev->dev_type) {
         case SAS_END_DEV:
+		INIT_LIST_HEAD(&dev->ssp_dev.eh_list_node);
                 break;
         case EDGE_DEV:
         case FANOUT_DEV:
@@ -274,6 +275,8 @@ void sas_free_device(struct kref *kref)
 
 static void sas_unregister_common_dev(struct asd_sas_port *port, struct domain_device *dev)
 {
+	struct sas_ha_struct *ha = port->ha;
+
 	sas_notify_lldd_dev_gone(dev);
 	if (!dev->parent)
 		dev->port->port_dev = NULL;
@@ -286,6 +289,14 @@ static void sas_unregister_common_dev(struct asd_sas_port *port, struct domain_d
 		sas_ata_end_eh(dev->sata_dev.ap);
 	spin_unlock_irq(&port->dev_list_lock);
 
+	spin_lock_irq(&ha->lock);
+	if (dev->dev_type == SAS_END_DEV &&
+	    !list_empty(&dev->ssp_dev.eh_list_node)) {
+		list_del_init(&dev->ssp_dev.eh_list_node);
+		ha->eh_active--;
+	}
+	spin_unlock_irq(&ha->lock);
+
 	sas_put_device(dev);
 }
 
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index 6909fef..1bbab3d 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -116,7 +116,9 @@ int sas_register_ha(struct sas_ha_struct *sas_ha)
 	set_bit(SAS_HA_REGISTERED, &sas_ha->state);
 	spin_lock_init(&sas_ha->lock);
 	mutex_init(&sas_ha->drain_mutex);
+	init_waitqueue_head(&sas_ha->eh_wait_q);
 	INIT_LIST_HEAD(&sas_ha->defer_q);
+	INIT_LIST_HEAD(&sas_ha->eh_dev_q);
 
 	error = sas_register_phys(sas_ha);
 	if (error) {
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index cd63d80..50a18a6 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -460,14 +460,88 @@ struct sas_phy *sas_get_local_phy(struct domain_device *dev)
 }
 EXPORT_SYMBOL_GPL(sas_get_local_phy);
 
+static void sas_wait_eh(struct domain_device *dev)
+{
+	struct sas_ha_struct *ha = dev->port->ha;
+	DEFINE_WAIT(wait);
+
+	if (dev_is_sata(dev)) {
+		ata_port_wait_eh(dev->sata_dev.ap);
+		return;
+	}
+ retry:
+	spin_lock_irq(&ha->lock);
+
+	while (test_bit(SAS_DEV_EH_PENDING, &dev->state)) {
+		prepare_to_wait(&ha->eh_wait_q, &wait, TASK_UNINTERRUPTIBLE);
+		spin_unlock_irq(&ha->lock);
+		schedule();
+		spin_lock_irq(&ha->lock);
+	}
+	finish_wait(&ha->eh_wait_q, &wait);
+
+	spin_unlock_irq(&ha->lock);
+
+	/* make sure SCSI EH is complete */
+	if (scsi_host_in_recovery(ha->core.shost)) {
+		msleep(10);
+		goto retry;
+	}
+}
+EXPORT_SYMBOL(sas_wait_eh);
+
+static int sas_queue_reset(struct domain_device *dev, int reset_type, int lun, int wait)
+{
+	struct sas_ha_struct *ha = dev->port->ha;
+	int scheduled = 0, tries = 100;
+
+	/* ata: promote lun reset to bus reset */
+	if (dev_is_sata(dev)) {
+		sas_ata_schedule_reset(dev);
+		if (wait)
+			sas_ata_wait_eh(dev);
+		return SUCCESS;
+	}
+
+	while (!scheduled && tries--) {
+		spin_lock_irq(&ha->lock);
+		if (!test_bit(SAS_DEV_EH_PENDING, &dev->state) &&
+		    !test_bit(reset_type, &dev->state)) {
+			scheduled = 1;
+			ha->eh_active++;
+			list_add_tail(&dev->ssp_dev.eh_list_node, &ha->eh_dev_q);
+			set_bit(SAS_DEV_EH_PENDING, &dev->state);
+			set_bit(reset_type, &dev->state);
+			int_to_scsilun(lun, &dev->ssp_dev.reset_lun);
+			scsi_schedule_eh(ha->core.shost);
+		}
+		spin_unlock_irq(&ha->lock);
+
+		if (wait)
+			sas_wait_eh(dev);
+
+		if (scheduled)
+			return SUCCESS;
+	}
+
+	SAS_DPRINTK("%s reset of %s failed\n",
+		    reset_type == SAS_DEV_LU_RESET ? "LUN" : "Bus",
+		    dev_name(&dev->rphy->dev));
+
+	return FAILED;
+}
+
 /* Attempt to send a LUN reset message to a device */
 int sas_eh_device_reset_handler(struct scsi_cmnd *cmd)
 {
-	struct domain_device *dev = cmd_to_domain_dev(cmd);
-	struct sas_internal *i =
-		to_sas_internal(dev->port->ha->core.shost->transportt);
-	struct scsi_lun lun;
 	int res;
+	struct scsi_lun lun;
+	struct Scsi_Host *host = cmd->device->host;
+	struct domain_device *dev = cmd_to_domain_dev(cmd);
+	struct sas_internal *i = to_sas_internal(host->transportt);
+
+	if (current != host->ehandler)
+		return sas_queue_reset(dev, SAS_DEV_LU_RESET, cmd->device->lun, 0);
 
 	int_to_scsilun(cmd->device->lun, &lun);
 
@@ -486,8 +560,12 @@ int sas_eh_bus_reset_handler(struct scsi_cmnd *cmd)
 {
 	struct domain_device *dev = cmd_to_domain_dev(cmd);
 	struct sas_phy *phy = sas_get_local_phy(dev);
+	struct Scsi_Host *host = cmd->device->host;
 	int res;
 
+	if (current != host->ehandler)
+		return sas_queue_reset(dev, SAS_DEV_RESET, 0, 0);
+
 	res = sas_phy_reset(phy, 1);
 	if (res)
 		SAS_DPRINTK("Bus reset of %s failed 0x%x\n",
@@ -667,6 +745,39 @@ static void sas_eh_handle_sas_errors(struct Scsi_Host *shost, struct list_head *
 	goto out;
 }
 
+static void sas_eh_handle_resets(struct Scsi_Host *shost)
+{
+	struct sas_ha_struct *ha = SHOST_TO_SAS_HA(shost);
+	struct sas_internal *i = to_sas_internal(shost->transportt);
+
+	/* handle directed resets to sas devices */
+	spin_lock_irq(&ha->lock);
+	while (!list_empty(&ha->eh_dev_q)) {
+		struct domain_device *dev;
+		struct ssp_device *ssp;
+
+		ssp = list_entry(ha->eh_dev_q.next, typeof(*ssp), eh_list_node);
+		list_del_init(&ssp->eh_list_node);
+		dev = container_of(ssp, typeof(*dev), ssp_dev);
+		kref_get(&dev->kref);
+		WARN_ONCE(dev_is_sata(dev), "ssp reset to ata device?\n");
+
+		spin_unlock_irq(&ha->lock);
+
+		if (test_and_clear_bit(SAS_DEV_LU_RESET, &dev->state))
+			i->dft->lldd_lu_reset(dev, ssp->reset_lun.scsi_lun);
+
+		if (test_and_clear_bit(SAS_DEV_RESET, &dev->state))
+			i->dft->lldd_I_T_nexus_reset(dev);
+
+		sas_put_device(dev);
+		spin_lock_irq(&ha->lock);
+		clear_bit(SAS_DEV_EH_PENDING, &dev->state);
+		ha->eh_active--;
+	}
+	spin_unlock_irq(&ha->lock);
+}
+
 
 void sas_scsi_recover_host(struct Scsi_Host *shost)
 {
@@ -709,6 +820,8 @@ out:
 	if (ha->lldd_max_execute_num > 1)
 		wake_up_process(ha->core.queue_thread);
 
+	sas_eh_handle_resets(shost);
+
 	/* now link into libata eh --- if we have any ata devices */
 	sas_ata_strategy_handler(shost);
 
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index c0c94b3..1b14d80 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -174,10 +174,17 @@ struct sata_device {
 	struct ata_taskfile tf;
 };
 
+struct ssp_device {
+	struct list_head eh_list_node; /* pending a user requested eh action */
+	struct scsi_lun reset_lun;
+};
+
 enum {
 	SAS_DEV_GONE,
 	SAS_DEV_DESTROY,
 	SAS_DEV_EH_PENDING,
+	SAS_DEV_LU_RESET,
+	SAS_DEV_RESET,
 };
 
 struct domain_device {
@@ -211,6 +218,7 @@ struct domain_device {
         union {
                 struct expander_device ex_dev;
                 struct sata_device     sata_dev; /* STP & directly attached */
+		struct ssp_device      ssp_dev;
         };
 
         void *lldd_dev;
@@ -387,6 +395,8 @@ struct sas_ha_struct {
 	unsigned long	  state;
 	spinlock_t 	  lock;
 	int		  eh_active;
+	wait_queue_head_t eh_wait_q;
+	struct list_head  eh_dev_q;
 
 	struct mutex disco_mutex;
 


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [libsas PATCH v10 5/9] libsas: add sas_eh_abort_handler
  2012-03-11  4:38 [libsas PATCH v10 0/9] libsas error handling + discovery v10 Dan Williams
                   ` (3 preceding siblings ...)
  2012-03-11  4:39 ` [libsas PATCH v10 4/9] libsas: enforce eh strategy handlers only in eh context Dan Williams
@ 2012-03-11  4:39 ` Dan Williams
  2012-03-11  4:39 ` [libsas PATCH v10 6/9] libsas: use ->lldd_I_T_nexus_reset for ->eh_bus_reset_handler Dan Williams
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Dan Williams @ 2012-03-11  4:39 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Jacek Danecki

When recovering failed eh-cmnds let the lldd attempt an abort via
scsi_abort_eh_cmnd before escalating.

Acked-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_scsi_host.c |   21 +++++++++++++++++++++
 include/scsi/libsas.h               |    1 +
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 50a18a6..dec992b 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -531,6 +531,27 @@ static int sas_queue_reset(struct domain_device *dev, int reset_type, int lun, i
 	return FAILED;
 }
 
+int sas_eh_abort_handler(struct scsi_cmnd *cmd)
+{
+	int res;
+	struct sas_task *task = TO_SAS_TASK(cmd);
+	struct Scsi_Host *host = cmd->device->host;
+	struct sas_internal *i = to_sas_internal(host->transportt);
+
+	if (current != host->ehandler)
+		return FAILED;
+
+	if (!i->dft->lldd_abort_task)
+		return FAILED;
+
+	res = i->dft->lldd_abort_task(task);
+	if (res == TMF_RESP_FUNC_SUCC || res == TMF_RESP_FUNC_COMPLETE)
+		return SUCCESS;
+
+	return FAILED;
+}
+EXPORT_SYMBOL_GPL(sas_eh_abort_handler);
+
 /* Attempt to send a LUN reset message to a device */
 int sas_eh_device_reset_handler(struct scsi_cmnd *cmd)
 {
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 1b14d80..45f9534 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -718,6 +718,7 @@ void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *);
 void sas_init_dev(struct domain_device *);
 
 void sas_task_abort(struct sas_task *);
+int sas_eh_abort_handler(struct scsi_cmnd *cmd);
 int sas_eh_device_reset_handler(struct scsi_cmnd *cmd);
 int sas_eh_bus_reset_handler(struct scsi_cmnd *cmd);
 


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [libsas PATCH v10 6/9] libsas: use ->lldd_I_T_nexus_reset for ->eh_bus_reset_handler
  2012-03-11  4:38 [libsas PATCH v10 0/9] libsas error handling + discovery v10 Dan Williams
                   ` (4 preceding siblings ...)
  2012-03-11  4:39 ` [libsas PATCH v10 5/9] libsas: add sas_eh_abort_handler Dan Williams
@ 2012-03-11  4:39 ` Dan Williams
  2012-03-11  5:37   ` jack_wang
  2012-03-11  4:39 ` [libsas PATCH v10 7/9] libsas: trim sas_task of slow path infrastructure Dan Williams
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Dan Williams @ 2012-03-11  4:39 UTC (permalink / raw)
  To: linux-scsi
  Cc: linux-ide, Jacek Danecki, Jack Wang, Luben Tuikov, Xiangliang Yu

sas_eh_bus_reset_handler() amounts to sas_phy_reset() without
notification of the reset to the lldd.  If this is triggered from
eh-cmnd recovery there may be sas_tasks for the lldd to terminate, so
->lldd_I_T_nexus_reset is warranted.

Cc: Xiangliang Yu <yuxiangl@marvell.com>
Cc: Luben Tuikov <ltuikov@yahoo.com>
Cc: Jack Wang <jack_wang@usish.com>
Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
[jacek: modify pm8001_I_T_nexus_reset to return -ENODEV]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_scsi_host.c |   19 ++++++++-----------
 drivers/scsi/pm8001/pm8001_sas.c    |    3 ++-
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index dec992b..a5c4ce4 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -576,25 +576,22 @@ int sas_eh_device_reset_handler(struct scsi_cmnd *cmd)
 	return FAILED;
 }
 
-/* Attempt to send a phy (bus) reset */
 int sas_eh_bus_reset_handler(struct scsi_cmnd *cmd)
 {
-	struct domain_device *dev = cmd_to_domain_dev(cmd);
-	struct sas_phy *phy = sas_get_local_phy(dev);
-	struct Scsi_Host *host = cmd->device->host;
 	int res;
+	struct Scsi_Host *host = cmd->device->host;
+	struct domain_device *dev = cmd_to_domain_dev(cmd);
+	struct sas_internal *i = to_sas_internal(host->transportt);
 
 	if (current != host->ehandler)
 		return sas_queue_reset(dev, SAS_DEV_RESET, 0, 0);
 
-	res = sas_phy_reset(phy, 1);
-	if (res)
-		SAS_DPRINTK("Bus reset of %s failed 0x%x\n",
-			    kobject_name(&phy->dev.kobj),
-			    res);
-	sas_put_local_phy(phy);
+	if (!i->dft->lldd_I_T_nexus_reset)
+		return FAILED;
 
-	if (res == TMF_RESP_FUNC_SUCC || res == TMF_RESP_FUNC_COMPLETE)
+	res = i->dft->lldd_I_T_nexus_reset(dev);
+	if (res == TMF_RESP_FUNC_SUCC || res == TMF_RESP_FUNC_COMPLETE ||
+	    res == -ENODEV)
 		return SUCCESS;
 
 	return FAILED;
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index b111018..3a03b40 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -868,8 +868,9 @@ int pm8001_I_T_nexus_reset(struct domain_device *dev)
 	struct pm8001_device *pm8001_dev;
 	struct pm8001_hba_info *pm8001_ha;
 	struct sas_phy *phy;
+
 	if (!dev || !dev->lldd_dev)
-		return -1;
+		return -ENODEV;
 
 	pm8001_dev = dev->lldd_dev;
 	pm8001_ha = pm8001_find_ha_by_dev(dev);


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [libsas PATCH v10 7/9] libsas: trim sas_task of slow path infrastructure
  2012-03-11  4:38 [libsas PATCH v10 0/9] libsas error handling + discovery v10 Dan Williams
                   ` (5 preceding siblings ...)
  2012-03-11  4:39 ` [libsas PATCH v10 6/9] libsas: use ->lldd_I_T_nexus_reset for ->eh_bus_reset_handler Dan Williams
@ 2012-03-11  4:39 ` Dan Williams
  2012-03-11  4:39 ` [libsas PATCH v10 8/9] libsas: fix sas_find_bcast_phy() in the presence of 'vacant' phys Dan Williams
  2012-03-11  4:39 ` [libsas PATCH v10 9/9] libsas: sas_rediscover_dev did not look at the SMP exec status Dan Williams
  8 siblings, 0 replies; 15+ messages in thread
From: Dan Williams @ 2012-03-11  4:39 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Christoph Hellwig, Jack Wang

The timer and the completion are only used for slow path tasks (smp, and
lldd tmfs), yet we incur the allocation space and cpu setup time for
every fast path task.

Cc: Christoph Hellwig <hch@lst.de>
Acked-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_expander.c  |   20 ++++++++++----------
 drivers/scsi/libsas/sas_init.c      |   23 +++++++++++++++++++++--
 drivers/scsi/libsas/sas_scsi_host.c |    8 ++++++--
 drivers/scsi/mvsas/mv_sas.c         |   20 ++++++++++----------
 drivers/scsi/pm8001/pm8001_sas.c    |   34 +++++++++++++++++-----------------
 include/scsi/libsas.h               |   14 +++++++++-----
 6 files changed, 73 insertions(+), 46 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 05acd9e..0ab3796 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -51,14 +51,14 @@ static void smp_task_timedout(unsigned long _task)
 		task->task_state_flags |= SAS_TASK_STATE_ABORTED;
 	spin_unlock_irqrestore(&task->task_state_lock, flags);
 
-	complete(&task->completion);
+	complete(&task->slow_task->completion);
 }
 
 static void smp_task_done(struct sas_task *task)
 {
-	if (!del_timer(&task->timer))
+	if (!del_timer(&task->slow_task->timer))
 		return;
-	complete(&task->completion);
+	complete(&task->slow_task->completion);
 }
 
 /* Give it some long enough timeout. In seconds. */
@@ -79,7 +79,7 @@ static int smp_execute_task(struct domain_device *dev, void *req, int req_size,
 			break;
 		}
 
-		task = sas_alloc_task(GFP_KERNEL);
+		task = sas_alloc_slow_task(GFP_KERNEL);
 		if (!task) {
 			res = -ENOMEM;
 			break;
@@ -91,20 +91,20 @@ static int smp_execute_task(struct domain_device *dev, void *req, int req_size,
 
 		task->task_done = smp_task_done;
 
-		task->timer.data = (unsigned long) task;
-		task->timer.function = smp_task_timedout;
-		task->timer.expires = jiffies + SMP_TIMEOUT*HZ;
-		add_timer(&task->timer);
+		task->slow_task->timer.data = (unsigned long) task;
+		task->slow_task->timer.function = smp_task_timedout;
+		task->slow_task->timer.expires = jiffies + SMP_TIMEOUT*HZ;
+		add_timer(&task->slow_task->timer);
 
 		res = i->dft->lldd_execute_task(task, 1, GFP_KERNEL);
 
 		if (res) {
-			del_timer(&task->timer);
+			del_timer(&task->slow_task->timer);
 			SAS_DPRINTK("executing SMP task failed:%d\n", res);
 			break;
 		}
 
-		wait_for_completion(&task->completion);
+		wait_for_completion(&task->slow_task->completion);
 		res = -ECOMM;
 		if ((task->task_state_flags & SAS_TASK_STATE_ABORTED)) {
 			SAS_DPRINTK("smp task timed out or aborted\n");
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index 1bbab3d..014297c 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -48,18 +48,37 @@ struct sas_task *sas_alloc_task(gfp_t flags)
 		INIT_LIST_HEAD(&task->list);
 		spin_lock_init(&task->task_state_lock);
 		task->task_state_flags = SAS_TASK_STATE_PENDING;
-		init_timer(&task->timer);
-		init_completion(&task->completion);
 	}
 
 	return task;
 }
 EXPORT_SYMBOL_GPL(sas_alloc_task);
 
+struct sas_task *sas_alloc_slow_task(gfp_t flags)
+{
+	struct sas_task *task = sas_alloc_task(flags);
+	struct sas_task_slow *slow = kmalloc(sizeof(*slow), flags);
+
+	if (!task || !slow) {
+		if (task)
+			kmem_cache_free(sas_task_cache, task);
+		kfree(slow);
+		return NULL;
+	}
+
+	task->slow_task = slow;
+	init_timer(&slow->timer);
+	init_completion(&slow->completion);
+
+	return task;
+}
+EXPORT_SYMBOL_GPL(sas_alloc_slow_task);
+
 void sas_free_task(struct sas_task *task)
 {
 	if (task) {
 		BUG_ON(!list_empty(&task->list));
+		kfree(task->slow_task);
 		kmem_cache_free(sas_task_cache, task);
 	}
 }
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index a5c4ce4..3db5225 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -1134,9 +1134,13 @@ void sas_task_abort(struct sas_task *task)
 
 	/* Escape for libsas internal commands */
 	if (!sc) {
-		if (!del_timer(&task->timer))
+		struct sas_task_slow *slow = task->slow_task;
+
+		if (!slow)
+			return;
+		if (!del_timer(&slow->timer))
 			return;
-		task->timer.function(task->timer.data);
+		slow->timer.function(slow->timer.data);
 		return;
 	} else {
 		struct request_queue *q = sc->device->request_queue;
diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
index b68a653..d0462b8 100644
--- a/drivers/scsi/mvsas/mv_sas.c
+++ b/drivers/scsi/mvsas/mv_sas.c
@@ -1365,9 +1365,9 @@ void mvs_dev_gone(struct domain_device *dev)
 
 static void mvs_task_done(struct sas_task *task)
 {
-	if (!del_timer(&task->timer))
+	if (!del_timer(&task->slow_task->timer))
 		return;
-	complete(&task->completion);
+	complete(&task->slow_task->completion);
 }
 
 static void mvs_tmf_timedout(unsigned long data)
@@ -1375,7 +1375,7 @@ static void mvs_tmf_timedout(unsigned long data)
 	struct sas_task *task = (struct sas_task *)data;
 
 	task->task_state_flags |= SAS_TASK_STATE_ABORTED;
-	complete(&task->completion);
+	complete(&task->slow_task->completion);
 }
 
 #define MVS_TASK_TIMEOUT 20
@@ -1386,7 +1386,7 @@ static int mvs_exec_internal_tmf_task(struct domain_device *dev,
 	struct sas_task *task = NULL;
 
 	for (retry = 0; retry < 3; retry++) {
-		task = sas_alloc_task(GFP_KERNEL);
+		task = sas_alloc_slow_task(GFP_KERNEL);
 		if (!task)
 			return -ENOMEM;
 
@@ -1396,20 +1396,20 @@ static int mvs_exec_internal_tmf_task(struct domain_device *dev,
 		memcpy(&task->ssp_task, parameter, para_len);
 		task->task_done = mvs_task_done;
 
-		task->timer.data = (unsigned long) task;
-		task->timer.function = mvs_tmf_timedout;
-		task->timer.expires = jiffies + MVS_TASK_TIMEOUT*HZ;
-		add_timer(&task->timer);
+		task->slow_task->timer.data = (unsigned long) task;
+		task->slow_task->timer.function = mvs_tmf_timedout;
+		task->slow_task->timer.expires = jiffies + MVS_TASK_TIMEOUT*HZ;
+		add_timer(&task->slow_task->timer);
 
 		res = mvs_task_exec(task, 1, GFP_KERNEL, NULL, 1, tmf);
 
 		if (res) {
-			del_timer(&task->timer);
+			del_timer(&task->slow_task->timer);
 			mv_printk("executing internel task failed:%d\n", res);
 			goto ex_err;
 		}
 
-		wait_for_completion(&task->completion);
+		wait_for_completion(&task->slow_task->completion);
 		res = TMF_RESP_FUNC_FAILED;
 		/* Even TMF timed out, return direct. */
 		if ((task->task_state_flags & SAS_TASK_STATE_ABORTED)) {
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index 3a03b40..8468f0e 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -627,9 +627,9 @@ int pm8001_dev_found(struct domain_device *dev)
 
 static void pm8001_task_done(struct sas_task *task)
 {
-	if (!del_timer(&task->timer))
+	if (!del_timer(&task->slow_task->timer))
 		return;
-	complete(&task->completion);
+	complete(&task->slow_task->completion);
 }
 
 static void pm8001_tmf_timedout(unsigned long data)
@@ -637,7 +637,7 @@ static void pm8001_tmf_timedout(unsigned long data)
 	struct sas_task *task = (struct sas_task *)data;
 
 	task->task_state_flags |= SAS_TASK_STATE_ABORTED;
-	complete(&task->completion);
+	complete(&task->slow_task->completion);
 }
 
 #define PM8001_TASK_TIMEOUT 20
@@ -660,7 +660,7 @@ static int pm8001_exec_internal_tmf_task(struct domain_device *dev,
 	struct pm8001_hba_info *pm8001_ha = pm8001_find_ha_by_dev(dev);
 
 	for (retry = 0; retry < 3; retry++) {
-		task = sas_alloc_task(GFP_KERNEL);
+		task = sas_alloc_slow_task(GFP_KERNEL);
 		if (!task)
 			return -ENOMEM;
 
@@ -668,21 +668,21 @@ static int pm8001_exec_internal_tmf_task(struct domain_device *dev,
 		task->task_proto = dev->tproto;
 		memcpy(&task->ssp_task, parameter, para_len);
 		task->task_done = pm8001_task_done;
-		task->timer.data = (unsigned long)task;
-		task->timer.function = pm8001_tmf_timedout;
-		task->timer.expires = jiffies + PM8001_TASK_TIMEOUT*HZ;
-		add_timer(&task->timer);
+		task->slow_task->timer.data = (unsigned long)task;
+		task->slow_task->timer.function = pm8001_tmf_timedout;
+		task->slow_task->timer.expires = jiffies + PM8001_TASK_TIMEOUT*HZ;
+		add_timer(&task->slow_task->timer);
 
 		res = pm8001_task_exec(task, 1, GFP_KERNEL, 1, tmf);
 
 		if (res) {
-			del_timer(&task->timer);
+			del_timer(&task->slow_task->timer);
 			PM8001_FAIL_DBG(pm8001_ha,
 				pm8001_printk("Executing internal task "
 				"failed\n"));
 			goto ex_err;
 		}
-		wait_for_completion(&task->completion);
+		wait_for_completion(&task->slow_task->completion);
 		res = -TMF_RESP_FUNC_FAILED;
 		/* Even TMF timed out, return direct. */
 		if ((task->task_state_flags & SAS_TASK_STATE_ABORTED)) {
@@ -742,17 +742,17 @@ pm8001_exec_internal_task_abort(struct pm8001_hba_info *pm8001_ha,
 	struct sas_task *task = NULL;
 
 	for (retry = 0; retry < 3; retry++) {
-		task = sas_alloc_task(GFP_KERNEL);
+		task = sas_alloc_slow_task(GFP_KERNEL);
 		if (!task)
 			return -ENOMEM;
 
 		task->dev = dev;
 		task->task_proto = dev->tproto;
 		task->task_done = pm8001_task_done;
-		task->timer.data = (unsigned long)task;
-		task->timer.function = pm8001_tmf_timedout;
-		task->timer.expires = jiffies + PM8001_TASK_TIMEOUT * HZ;
-		add_timer(&task->timer);
+		task->slow_task->timer.data = (unsigned long)task;
+		task->slow_task->timer.function = pm8001_tmf_timedout;
+		task->slow_task->timer.expires = jiffies + PM8001_TASK_TIMEOUT * HZ;
+		add_timer(&task->slow_task->timer);
 
 		res = pm8001_tag_alloc(pm8001_ha, &ccb_tag);
 		if (res)
@@ -766,13 +766,13 @@ pm8001_exec_internal_task_abort(struct pm8001_hba_info *pm8001_ha,
 			pm8001_dev, flag, task_tag, ccb_tag);
 
 		if (res) {
-			del_timer(&task->timer);
+			del_timer(&task->slow_task->timer);
 			PM8001_FAIL_DBG(pm8001_ha,
 				pm8001_printk("Executing internal task "
 				"failed\n"));
 			goto ex_err;
 		}
-		wait_for_completion(&task->completion);
+		wait_for_completion(&task->slow_task->completion);
 		res = TMF_RESP_FUNC_FAILED;
 		/* Even TMF timed out, return direct. */
 		if ((task->task_state_flags & SAS_TASK_STATE_ABORTED)) {
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 45f9534..f66e83b 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -612,10 +612,6 @@ struct sas_task {
 
 	enum   sas_protocol      task_proto;
 
-	/* Used by the discovery code. */
-	struct timer_list     timer;
-	struct completion     completion;
-
 	union {
 		struct sas_ata_task ata_task;
 		struct sas_smp_task smp_task;
@@ -632,8 +628,15 @@ struct sas_task {
 
 	void   *lldd_task;	  /* for use by LLDDs */
 	void   *uldd_task;
+	struct sas_task_slow *slow_task;
+};
 
-	struct work_struct abort_work;
+struct sas_task_slow {
+	/* standard/extra infrastructure for slow path commands (SMP and
+	 * internal lldd commands
+	 */
+	struct timer_list     timer;
+	struct completion     completion;
 };
 
 #define SAS_TASK_STATE_PENDING      1
@@ -643,6 +646,7 @@ struct sas_task {
 #define SAS_TASK_AT_INITIATOR       16
 
 extern struct sas_task *sas_alloc_task(gfp_t flags);
+extern struct sas_task *sas_alloc_slow_task(gfp_t flags);
 extern void sas_free_task(struct sas_task *task);
 
 struct sas_domain_function_template {


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [libsas PATCH v10 8/9] libsas: fix sas_find_bcast_phy() in the presence of 'vacant' phys
  2012-03-11  4:38 [libsas PATCH v10 0/9] libsas error handling + discovery v10 Dan Williams
                   ` (6 preceding siblings ...)
  2012-03-11  4:39 ` [libsas PATCH v10 7/9] libsas: trim sas_task of slow path infrastructure Dan Williams
@ 2012-03-11  4:39 ` Dan Williams
  2012-03-11  4:39 ` [libsas PATCH v10 9/9] libsas: sas_rediscover_dev did not look at the SMP exec status Dan Williams
  8 siblings, 0 replies; 15+ messages in thread
From: Dan Williams @ 2012-03-11  4:39 UTC (permalink / raw)
  To: linux-scsi; +Cc: Thomas Jackson, linux-ide, stable

From: Thomas Jackson <thomas.p.jackson@intel.com>

If an expander reports 'PHY VACANT' for a phy index prior to the one
that generated a BCN libsas fails rediscovery.  Since a vacant phy is
defined as a valid phy index that will never have an attached device
just continue the search.

Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Jackson <thomas.p.jackson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_expander.c |   17 ++++++++++++-----
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 0ab3796..cacf3fe 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -1718,9 +1718,17 @@ static int sas_find_bcast_phy(struct domain_device *dev, int *phy_id,
 		int phy_change_count = 0;
 
 		res = sas_get_phy_change_count(dev, i, &phy_change_count);
-		if (res)
-			goto out;
-		else if (phy_change_count != ex->ex_phy[i].phy_change_count) {
+		switch (res) {
+		case SMP_RESP_PHY_VACANT:
+		case SMP_RESP_NO_PHY:
+			continue;
+		case SMP_RESP_FUNC_ACC:
+			break;
+		default:
+			return res;
+		}
+
+		if (phy_change_count != ex->ex_phy[i].phy_change_count) {
 			if (update)
 				ex->ex_phy[i].phy_change_count =
 					phy_change_count;
@@ -1728,8 +1736,7 @@ static int sas_find_bcast_phy(struct domain_device *dev, int *phy_id,
 			return 0;
 		}
 	}
-out:
-	return res;
+	return 0;
 }
 
 static int sas_get_ex_change_count(struct domain_device *dev, int *ecc)


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [libsas PATCH v10 9/9] libsas: sas_rediscover_dev did not look at the SMP exec status.
  2012-03-11  4:38 [libsas PATCH v10 0/9] libsas error handling + discovery v10 Dan Williams
                   ` (7 preceding siblings ...)
  2012-03-11  4:39 ` [libsas PATCH v10 8/9] libsas: fix sas_find_bcast_phy() in the presence of 'vacant' phys Dan Williams
@ 2012-03-11  4:39 ` Dan Williams
  8 siblings, 0 replies; 15+ messages in thread
From: Dan Williams @ 2012-03-11  4:39 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Jeff Skirvin

From: Jeff Skirvin <jeffrey.d.skirvin@intel.com>

The discovery function "sas_rediscover_dev" had two bugs: 1) it did
not pay attention to the return status from the SMP task execution;
2) the stack variable used for the returned SAS address was compared
against 0 without being initialized.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
[djbw: todo sanitize smp_execute_task return values (see: residue)]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_expander.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index cacf3fe..e9423f3 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -1980,6 +1980,7 @@ static int sas_rediscover_dev(struct domain_device *dev, int phy_id, bool last)
 	u8 sas_addr[8];
 	int res;
 
+	memset(sas_addr, 0, 8);
 	res = sas_get_phy_attached_dev(dev, phy_id, sas_addr, &type);
 	switch (res) {
 	case SMP_RESP_NO_PHY:
@@ -1992,9 +1993,13 @@ static int sas_rediscover_dev(struct domain_device *dev, int phy_id, bool last)
 		return res;
 	case SMP_RESP_FUNC_ACC:
 		break;
+	case -ECOMM:
+		break;
+	default:
+		return res;
 	}
 
-	if (SAS_ADDR(sas_addr) == 0) {
+	if ((SAS_ADDR(sas_addr) == 0) || (res == -ECOMM)) {
 		phy->phy_state = PHY_EMPTY;
 		sas_unregister_devs_sas_addr(dev, phy_id, last);
 		return res;


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [libsas PATCH v10 6/9] libsas: use ->lldd_I_T_nexus_reset for ->eh_bus_reset_handler
  2012-03-11  4:39 ` [libsas PATCH v10 6/9] libsas: use ->lldd_I_T_nexus_reset for ->eh_bus_reset_handler Dan Williams
@ 2012-03-11  5:37   ` jack_wang
  0 siblings, 0 replies; 15+ messages in thread
From: jack_wang @ 2012-03-11  5:37 UTC (permalink / raw)
  To: Dan Williams, linux-scsi
  Cc: linux-ide, Jacek Danecki, Luben Tuikov, Xiangliang Yu

Looks OK to me.
Thanks.
Acked-by: Jack Wang <jack_wang@usish.com>


--------------
jack_wang
>sas_eh_bus_reset_handler() amounts to sas_phy_reset() without
>notification of the reset to the lldd.  If this is triggered from
>eh-cmnd recovery there may be sas_tasks for the lldd to terminate, so
>->lldd_I_T_nexus_reset is warranted.
>
>Cc: Xiangliang Yu <yuxiangl@marvell.com>
>Cc: Luben Tuikov <ltuikov@yahoo.com>
>Cc: Jack Wang <jack_wang@usish.com>
>Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
>[jacek: modify pm8001_I_T_nexus_reset to return -ENODEV]
>Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>---
> drivers/scsi/libsas/sas_scsi_host.c |   19 ++++++++-----------
> drivers/scsi/pm8001/pm8001_sas.c    |    3 ++-
> 2 files changed, 10 insertions(+), 12 deletions(-)
>
>diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
>index dec992b..a5c4ce4 100644
>--- a/drivers/scsi/libsas/sas_scsi_host.c
>+++ b/drivers/scsi/libsas/sas_scsi_host.c
>@@ -576,25 +576,22 @@ int sas_eh_device_reset_handler(struct scsi_cmnd *cmd)
> 	return FAILED;
> }
> 
>-/* Attempt to send a phy (bus) reset */
> int sas_eh_bus_reset_handler(struct scsi_cmnd *cmd)
> {
>-	struct domain_device *dev = cmd_to_domain_dev(cmd);
>-	struct sas_phy *phy = sas_get_local_phy(dev);
>-	struct Scsi_Host *host = cmd->device->host;
> 	int res;
>+	struct Scsi_Host *host = cmd->device->host;
>+	struct domain_device *dev = cmd_to_domain_dev(cmd);
>+	struct sas_internal *i = to_sas_internal(host->transportt);
> 
> 	if (current != host->ehandler)
> 		return sas_queue_reset(dev, SAS_DEV_RESET, 0, 0);
> 
>-	res = sas_phy_reset(phy, 1);
>-	if (res)
>-		SAS_DPRINTK("Bus reset of %s failed 0x%x\n",
>-			    kobject_name(&phy->dev.kobj),
>-			    res);
>-	sas_put_local_phy(phy);
>+	if (!i->dft->lldd_I_T_nexus_reset)
>+		return FAILED;
> 
>-	if (res == TMF_RESP_FUNC_SUCC || res == TMF_RESP_FUNC_COMPLETE)
>+	res = i->dft->lldd_I_T_nexus_reset(dev);
>+	if (res == TMF_RESP_FUNC_SUCC || res == TMF_RESP_FUNC_COMPLETE ||
>+	    res == -ENODEV)
> 		return SUCCESS;
> 
> 	return FAILED;
>diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
>index b111018..3a03b40 100644
>--- a/drivers/scsi/pm8001/pm8001_sas.c
>+++ b/drivers/scsi/pm8001/pm8001_sas.c
>@@ -868,8 +868,9 @@ int pm8001_I_T_nexus_reset(struct domain_device *dev)
> 	struct pm8001_device *pm8001_dev;
> 	struct pm8001_hba_info *pm8001_ha;
> 	struct sas_phy *phy;
>+
> 	if (!dev || !dev->lldd_dev)
>-		return -1;
>+		return -ENODEV;
> 
> 	pm8001_dev = dev->lldd_dev;
> 	pm8001_ha = pm8001_find_ha_by_dev(dev);
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>__________ Information from ESET NOD32 Antivirus, version of virus signature database 5659 (20101129) __________
>
>The message was checked by ESET NOD32 Antivirus.
>
>http://www.eset.com
>
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [libsas PATCH v10 2/9] libsas: cleanup spurious calls to scsi_schedule_eh
  2012-03-11  4:39 ` [libsas PATCH v10 2/9] libsas: cleanup spurious calls to scsi_schedule_eh Dan Williams
@ 2012-03-16 15:25   ` Dan Williams
  0 siblings, 0 replies; 15+ messages in thread
From: Dan Williams @ 2012-03-16 15:25 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

On Sat, Mar 10, 2012 at 8:39 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> eh is woken up automatically by the presence of failed commands,
> scsi_schedule_eh is reserved for cases where there are no failed
> commands.  This guarantees that host_eh_sceduled is only incremented
> when an explicit eh request is made.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/scsi/libsas/sas_ata.c       |   25 -------------------------
>  drivers/scsi/libsas/sas_scsi_host.c |    5 -----
>  include/scsi/sas_ata.h              |    4 ----
>  3 files changed, 0 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
> index bc0cecc..729a7b6 100644
> --- a/drivers/scsi/libsas/sas_ata.c
> +++ b/drivers/scsi/libsas/sas_ata.c
> @@ -574,31 +574,6 @@ int sas_ata_init_host_and_port(struct domain_device *found_dev)
>        return 0;
>  }
>
> -void sas_ata_task_abort(struct sas_task *task)
> -{
> -       struct ata_queued_cmd *qc = task->uldd_task;
> -       struct completion *waiting;
> -
> -       /* Bounce SCSI-initiated commands to the SCSI EH */
> -       if (qc->scsicmd) {
> -               struct request_queue *q = qc->scsicmd->device->request_queue;
> -               unsigned long flags;
> -
> -               spin_lock_irqsave(q->queue_lock, flags);
> -               blk_abort_request(qc->scsicmd->request);
> -               spin_unlock_irqrestore(q->queue_lock, flags);
> -               scsi_schedule_eh(qc->scsicmd->device->host);

Although we don't need this call to scsi_schedule_eh, we certainly
need to handle task->uldd_task differently for ata tasks, will resend.

--
Dan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [libsas PATCH v10 3/9] libata, libsas: introduce sched_eh and end_eh port ops
  2012-03-11  4:39 ` [libsas PATCH v10 3/9] libata, libsas: introduce sched_eh and end_eh port ops Dan Williams
@ 2012-04-11  2:13   ` Dan Williams
  2012-04-11 11:39     ` Jacek Danecki
  0 siblings, 1 reply; 15+ messages in thread
From: Dan Williams @ 2012-04-11  2:13 UTC (permalink / raw)
  To: linux-scsi; +Cc: Tejun Heo, linux-ide, Jacek Danecki, Jeff Garzik

Ping?  Ack on the ata changes?  I'd like to submit this to
scsi-rc-fixes for 3.4-rc to fix regression fallout around
host_eh_scheduled handling now that libsas ata eh is asynchronous.

On Sat, Mar 10, 2012 at 8:39 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> When managing shost->host_eh_scheduled libata assumes that there is a
> 1:1 shost-to-ata_port relationship.  libsas creates a 1:N relationship
> so it needs to manage host_eh_scheduled cumulatively at the host level.
> The sched_eh and end_eh port port ops allow libsas to track when domain
> devices enter/leave the "eh-pending" state under ha->lock (previously
> named ha->state_lock, but it is no longer just a lock for ha->state
> changes).
>
> Since host_eh_scheduled indicates eh without backing commands pinning
> the device it can be deallocated at any time.  Move the taking of the
> domain_device reference under the port_lock to guarantee that the
> ata_port stays around for the duration of eh.
>
> Cc: Tejun Heo <tj@kernel.org>
> Acked-by: Jacek Danecki <jacek.danecki@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/ata/libata-core.c           |    4 ++
>  drivers/ata/libata-eh.c             |   57 ++++++++++++++++++++++++++++-------
>  drivers/scsi/libsas/sas_ata.c       |   38 +++++++++++++++++++++--
>  drivers/scsi/libsas/sas_discover.c  |    6 ++--
>  drivers/scsi/libsas/sas_event.c     |   12 ++++---
>  drivers/scsi/libsas/sas_init.c      |   14 ++++-----
>  drivers/scsi/libsas/sas_scsi_host.c |   27 +++++++++++++----
>  include/linux/libata.h              |    4 ++
>  include/scsi/libsas.h               |    4 ++
>  include/scsi/sas_ata.h              |    5 +++
>  10 files changed, 134 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index 1654c94..827881e 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -79,6 +79,8 @@ const struct ata_port_operations ata_base_port_ops = {
>        .prereset               = ata_std_prereset,
>        .postreset              = ata_std_postreset,
>        .error_handler          = ata_std_error_handler,
> +       .sched_eh               = ata_std_sched_eh,
> +       .end_eh                 = ata_std_end_eh,
>  };
>
>  const struct ata_port_operations sata_port_ops = {
> @@ -6573,6 +6575,8 @@ struct ata_port_operations ata_dummy_port_ops = {
>        .qc_prep                = ata_noop_qc_prep,
>        .qc_issue               = ata_dummy_qc_issue,
>        .error_handler          = ata_dummy_error_handler,
> +       .sched_eh               = ata_std_sched_eh,
> +       .end_eh                 = ata_std_end_eh,
>  };
>
>  const struct ata_port_info ata_dummy_port_info = {
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index c61316e..4f12f63 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -793,12 +793,12 @@ void ata_scsi_port_error_handler(struct Scsi_Host *host, struct ata_port *ap)
>                ata_for_each_link(link, ap, HOST_FIRST)
>                        memset(&link->eh_info, 0, sizeof(link->eh_info));
>
> -               /* Clear host_eh_scheduled while holding ap->lock such
> -                * that if exception occurs after this point but
> -                * before EH completion, SCSI midlayer will
> +               /* end eh (clear host_eh_scheduled) while holding
> +                * ap->lock such that if exception occurs after this
> +                * point but before EH completion, SCSI midlayer will
>                 * re-initiate EH.
>                 */
> -               host->host_eh_scheduled = 0;
> +               ap->ops->end_eh(ap);
>
>                spin_unlock_irqrestore(ap->lock, flags);
>                ata_eh_release(ap);
> @@ -986,16 +986,13 @@ void ata_qc_schedule_eh(struct ata_queued_cmd *qc)
>  }
>
>  /**
> - *     ata_port_schedule_eh - schedule error handling without a qc
> - *     @ap: ATA port to schedule EH for
> - *
> - *     Schedule error handling for @ap.  EH will kick in as soon as
> - *     all commands are drained.
> + * ata_std_sched_eh - non-libsas ata_ports issue eh with this common routine
> + * @ap: ATA port to schedule EH for
>  *
> - *     LOCKING:
> + *     LOCKING: inherited from ata_port_schedule_eh
>  *     spin_lock_irqsave(host lock)
>  */
> -void ata_port_schedule_eh(struct ata_port *ap)
> +void ata_std_sched_eh(struct ata_port *ap)
>  {
>        WARN_ON(!ap->ops->error_handler);
>
> @@ -1007,6 +1004,44 @@ void ata_port_schedule_eh(struct ata_port *ap)
>
>        DPRINTK("port EH scheduled\n");
>  }
> +EXPORT_SYMBOL_GPL(ata_std_sched_eh);
> +
> +/**
> + * ata_std_end_eh - non-libsas ata_ports complete eh with this common routine
> + * @ap: ATA port to end EH for
> + *
> + * In the libata object model there is a 1:1 mapping of ata_port to
> + * shost, so host fields can be directly manipulated under ap->lock, in
> + * the libsas case we need to hold a lock at the ha->level to coordinate
> + * these events.
> + *
> + *     LOCKING:
> + *     spin_lock_irqsave(host lock)
> + */
> +void ata_std_end_eh(struct ata_port *ap)
> +{
> +       struct Scsi_Host *host = ap->scsi_host;
> +
> +       host->host_eh_scheduled = 0;
> +}
> +EXPORT_SYMBOL(ata_std_end_eh);
> +
> +
> +/**
> + *     ata_port_schedule_eh - schedule error handling without a qc
> + *     @ap: ATA port to schedule EH for
> + *
> + *     Schedule error handling for @ap.  EH will kick in as soon as
> + *     all commands are drained.
> + *
> + *     LOCKING:
> + *     spin_lock_irqsave(host lock)
> + */
> +void ata_port_schedule_eh(struct ata_port *ap)
> +{
> +       /* see: ata_std_sched_eh, unless you know better */
> +       ap->ops->sched_eh(ap);
> +}
>
>  static int ata_do_link_abort(struct ata_port *ap, struct ata_link *link)
>  {
> diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
> index 729a7b6..d30a093 100644
> --- a/drivers/scsi/libsas/sas_ata.c
> +++ b/drivers/scsi/libsas/sas_ata.c
> @@ -523,6 +523,31 @@ static void sas_ata_set_dmamode(struct ata_port *ap, struct ata_device *ata_dev)
>                i->dft->lldd_ata_set_dmamode(dev);
>  }
>
> +static void sas_ata_sched_eh(struct ata_port *ap)
> +{
> +       struct domain_device *dev = ap->private_data;
> +       struct sas_ha_struct *ha = dev->port->ha;
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&ha->lock, flags);
> +       if (!test_and_set_bit(SAS_DEV_EH_PENDING, &dev->state))
> +               ha->eh_active++;
> +       ata_std_sched_eh(ap);
> +       spin_unlock_irqrestore(&ha->lock, flags);
> +}
> +
> +void sas_ata_end_eh(struct ata_port *ap)
> +{
> +       struct domain_device *dev = ap->private_data;
> +       struct sas_ha_struct *ha = dev->port->ha;
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&ha->lock, flags);
> +       if (test_and_clear_bit(SAS_DEV_EH_PENDING, &dev->state))
> +               ha->eh_active--;
> +       spin_unlock_irqrestore(&ha->lock, flags);
> +}
> +
>  static struct ata_port_operations sas_sata_ops = {
>        .prereset               = ata_std_prereset,
>        .hardreset              = sas_ata_hard_reset,
> @@ -536,6 +561,8 @@ static struct ata_port_operations sas_sata_ops = {
>        .port_start             = ata_sas_port_start,
>        .port_stop              = ata_sas_port_stop,
>        .set_dmamode            = sas_ata_set_dmamode,
> +       .sched_eh               = sas_ata_sched_eh,
> +       .end_eh                 = sas_ata_end_eh,
>  };
>
>  static struct ata_port_info sata_port_info = {
> @@ -684,10 +711,6 @@ static void async_sas_ata_eh(void *data, async_cookie_t cookie)
>        struct ata_port *ap = dev->sata_dev.ap;
>        struct sas_ha_struct *ha = dev->port->ha;
>
> -       /* hold a reference over eh since we may be racing with final
> -        * remove once all commands are completed
> -        */
> -       kref_get(&dev->kref);
>        sas_ata_printk(KERN_DEBUG, dev, "dev error handler\n");
>        ata_scsi_port_error_handler(ha->core.shost, ap);
>        sas_put_device(dev);
> @@ -730,6 +753,13 @@ void sas_ata_strategy_handler(struct Scsi_Host *shost)
>                list_for_each_entry(dev, &port->dev_list, dev_list_node) {
>                        if (!sas_ata_dev_eh_valid(dev))
>                                continue;
> +
> +                       /* hold a reference over eh since we may be
> +                        * racing with final remove once all commands
> +                        * are completed
> +                        */
> +                       kref_get(&dev->kref);
> +
>                        async_schedule_domain(async_sas_ata_eh, dev, &async);
>                }
>                spin_unlock(&port->dev_list_lock);
> diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
> index bd8c636..a730ac9 100644
> --- a/drivers/scsi/libsas/sas_discover.c
> +++ b/drivers/scsi/libsas/sas_discover.c
> @@ -282,6 +282,8 @@ static void sas_unregister_common_dev(struct asd_sas_port *port, struct domain_d
>
>        spin_lock_irq(&port->dev_list_lock);
>        list_del_init(&dev->dev_list_node);
> +       if (dev_is_sata(dev))
> +               sas_ata_end_eh(dev->sata_dev.ap);
>        spin_unlock_irq(&port->dev_list_lock);
>
>        sas_put_device(dev);
> @@ -479,9 +481,9 @@ static void sas_chain_event(int event, unsigned long *pending,
>        if (!test_and_set_bit(event, pending)) {
>                unsigned long flags;
>
> -               spin_lock_irqsave(&ha->state_lock, flags);
> +               spin_lock_irqsave(&ha->lock, flags);
>                sas_chain_work(ha, sw);
> -               spin_unlock_irqrestore(&ha->state_lock, flags);
> +               spin_unlock_irqrestore(&ha->lock, flags);
>        }
>  }
>
> diff --git a/drivers/scsi/libsas/sas_event.c b/drivers/scsi/libsas/sas_event.c
> index 4e4292d..789c4d8 100644
> --- a/drivers/scsi/libsas/sas_event.c
> +++ b/drivers/scsi/libsas/sas_event.c
> @@ -47,9 +47,9 @@ static void sas_queue_event(int event, unsigned long *pending,
>        if (!test_and_set_bit(event, pending)) {
>                unsigned long flags;
>
> -               spin_lock_irqsave(&ha->state_lock, flags);
> +               spin_lock_irqsave(&ha->lock, flags);
>                sas_queue_work(ha, work);
> -               spin_unlock_irqrestore(&ha->state_lock, flags);
> +               spin_unlock_irqrestore(&ha->lock, flags);
>        }
>  }
>
> @@ -61,18 +61,18 @@ void __sas_drain_work(struct sas_ha_struct *ha)
>
>        set_bit(SAS_HA_DRAINING, &ha->state);
>        /* flush submitters */
> -       spin_lock_irq(&ha->state_lock);
> -       spin_unlock_irq(&ha->state_lock);
> +       spin_lock_irq(&ha->lock);
> +       spin_unlock_irq(&ha->lock);
>
>        drain_workqueue(wq);
>
> -       spin_lock_irq(&ha->state_lock);
> +       spin_lock_irq(&ha->lock);
>        clear_bit(SAS_HA_DRAINING, &ha->state);
>        list_for_each_entry_safe(sw, _sw, &ha->defer_q, drain_node) {
>                list_del_init(&sw->drain_node);
>                sas_queue_work(ha, sw);
>        }
> -       spin_unlock_irq(&ha->state_lock);
> +       spin_unlock_irq(&ha->lock);
>  }
>
>  int sas_drain_work(struct sas_ha_struct *ha)
> diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
> index 10cb5ae..6909fef 100644
> --- a/drivers/scsi/libsas/sas_init.c
> +++ b/drivers/scsi/libsas/sas_init.c
> @@ -114,7 +114,7 @@ int sas_register_ha(struct sas_ha_struct *sas_ha)
>                sas_ha->lldd_queue_size = 128; /* Sanity */
>
>        set_bit(SAS_HA_REGISTERED, &sas_ha->state);
> -       spin_lock_init(&sas_ha->state_lock);
> +       spin_lock_init(&sas_ha->lock);
>        mutex_init(&sas_ha->drain_mutex);
>        INIT_LIST_HEAD(&sas_ha->defer_q);
>
> @@ -163,9 +163,9 @@ int sas_unregister_ha(struct sas_ha_struct *sas_ha)
>         * events to be queued, and flush any in-progress drainers
>         */
>        mutex_lock(&sas_ha->drain_mutex);
> -       spin_lock_irq(&sas_ha->state_lock);
> +       spin_lock_irq(&sas_ha->lock);
>        clear_bit(SAS_HA_REGISTERED, &sas_ha->state);
> -       spin_unlock_irq(&sas_ha->state_lock);
> +       spin_unlock_irq(&sas_ha->lock);
>        __sas_drain_work(sas_ha);
>        mutex_unlock(&sas_ha->drain_mutex);
>
> @@ -411,9 +411,9 @@ static int queue_phy_reset(struct sas_phy *phy, int hard_reset)
>        d->reset_result = 0;
>        d->hard_reset = hard_reset;
>
> -       spin_lock_irq(&ha->state_lock);
> +       spin_lock_irq(&ha->lock);
>        sas_queue_work(ha, &d->reset_work);
> -       spin_unlock_irq(&ha->state_lock);
> +       spin_unlock_irq(&ha->lock);
>
>        rc = sas_drain_work(ha);
>        if (rc == 0)
> @@ -438,9 +438,9 @@ static int queue_phy_enable(struct sas_phy *phy, int enable)
>        d->enable_result = 0;
>        d->enable = enable;
>
> -       spin_lock_irq(&ha->state_lock);
> +       spin_lock_irq(&ha->lock);
>        sas_queue_work(ha, &d->enable_work);
> -       spin_unlock_irq(&ha->state_lock);
> +       spin_unlock_irq(&ha->lock);
>
>        rc = sas_drain_work(ha);
>        if (rc == 0)
> diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
> index 49a9113..cd63d80 100644
> --- a/drivers/scsi/libsas/sas_scsi_host.c
> +++ b/drivers/scsi/libsas/sas_scsi_host.c
> @@ -667,16 +667,20 @@ static void sas_eh_handle_sas_errors(struct Scsi_Host *shost, struct list_head *
>        goto out;
>  }
>
> +
>  void sas_scsi_recover_host(struct Scsi_Host *shost)
>  {
>        struct sas_ha_struct *ha = SHOST_TO_SAS_HA(shost);
> -       unsigned long flags;
>        LIST_HEAD(eh_work_q);
> +       int tries = 0;
> +       bool retry;
>
> -       spin_lock_irqsave(shost->host_lock, flags);
> +retry:
> +       tries++;
> +       retry = true;
> +       spin_lock_irq(shost->host_lock);
>        list_splice_init(&shost->eh_cmd_q, &eh_work_q);
> -       shost->host_eh_scheduled = 0;
> -       spin_unlock_irqrestore(shost->host_lock, flags);
> +       spin_unlock_irq(shost->host_lock);
>
>        SAS_DPRINTK("Enter %s busy: %d failed: %d\n",
>                    __func__, shost->host_busy, shost->host_failed);
> @@ -710,8 +714,19 @@ out:
>
>        scsi_eh_flush_done_q(&ha->eh_done_q);
>
> -       SAS_DPRINTK("--- Exit %s: busy: %d failed: %d\n",
> -                   __func__, shost->host_busy, shost->host_failed);
> +       /* check if any new eh work was scheduled during the last run */
> +       spin_lock_irq(&ha->lock);
> +       if (ha->eh_active == 0) {
> +               shost->host_eh_scheduled = 0;
> +               retry = false;
> +       }
> +       spin_unlock_irq(&ha->lock);
> +
> +       if (retry)
> +               goto retry;
> +
> +       SAS_DPRINTK("--- Exit %s: busy: %d failed: %d tries: %d\n",
> +                   __func__, shost->host_busy, shost->host_failed, tries);
>  }
>
>  enum blk_eh_timer_return sas_scsi_timed_out(struct scsi_cmnd *cmd)
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 42378d6..83ff256 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -845,6 +845,8 @@ struct ata_port_operations {
>        void (*error_handler)(struct ata_port *ap);
>        void (*lost_interrupt)(struct ata_port *ap);
>        void (*post_internal_cmd)(struct ata_queued_cmd *qc);
> +       void (*sched_eh)(struct ata_port *ap);
> +       void (*end_eh)(struct ata_port *ap);
>
>        /*
>         * Optional features
> @@ -1165,6 +1167,8 @@ extern void ata_do_eh(struct ata_port *ap, ata_prereset_fn_t prereset,
>                      ata_reset_fn_t softreset, ata_reset_fn_t hardreset,
>                      ata_postreset_fn_t postreset);
>  extern void ata_std_error_handler(struct ata_port *ap);
> +extern void ata_std_sched_eh(struct ata_port *ap);
> +extern void ata_std_end_eh(struct ata_port *ap);
>  extern int ata_link_nr_enabled(struct ata_link *link);
>
>  /*
> diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
> index f4f1c96..c0c94b3 100644
> --- a/include/scsi/libsas.h
> +++ b/include/scsi/libsas.h
> @@ -177,6 +177,7 @@ struct sata_device {
>  enum {
>        SAS_DEV_GONE,
>        SAS_DEV_DESTROY,
> +       SAS_DEV_EH_PENDING,
>  };
>
>  struct domain_device {
> @@ -384,7 +385,8 @@ struct sas_ha_struct {
>        struct list_head  defer_q; /* work queued while draining */
>        struct mutex      drain_mutex;
>        unsigned long     state;
> -       spinlock_t        state_lock;
> +       spinlock_t        lock;
> +       int               eh_active;
>
>        struct mutex disco_mutex;
>
> diff --git a/include/scsi/sas_ata.h b/include/scsi/sas_ata.h
> index 6d5d60c..3e13c28 100644
> --- a/include/scsi/sas_ata.h
> +++ b/include/scsi/sas_ata.h
> @@ -44,6 +44,7 @@ void sas_ata_eh(struct Scsi_Host *shost, struct list_head *work_q,
>  void sas_ata_schedule_reset(struct domain_device *dev);
>  void sas_ata_wait_eh(struct domain_device *dev);
>  void sas_probe_sata(struct asd_sas_port *port);
> +void sas_ata_end_eh(struct ata_port *ap);
>  #else
>
>
> @@ -81,6 +82,10 @@ static inline int sas_get_ata_info(struct domain_device *dev, struct ex_phy *phy
>  {
>        return 0;
>  }
> +
> +static inline void sas_ata_end_eh(struct ata_port *ap)
> +{
> +}
>  #endif
>
>  #endif /* _SAS_ATA_H_ */
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [libsas PATCH v10 3/9] libata, libsas: introduce sched_eh and end_eh port ops
  2012-04-11  2:13   ` Dan Williams
@ 2012-04-11 11:39     ` Jacek Danecki
  2012-04-11 18:04       ` Dan Williams
  0 siblings, 1 reply; 15+ messages in thread
From: Jacek Danecki @ 2012-04-11 11:39 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-scsi, Tejun Heo, linux-ide, Jeff Garzik

On 04/11/12 04:13, Dan Williams wrote:
> Ping?  Ack on the ata changes?

Acked-by: Jacek Danecki <jacek.danecki@intel.com>

-- 
Jacek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [libsas PATCH v10 3/9] libata, libsas: introduce sched_eh and end_eh port ops
  2012-04-11 11:39     ` Jacek Danecki
@ 2012-04-11 18:04       ` Dan Williams
  0 siblings, 0 replies; 15+ messages in thread
From: Dan Williams @ 2012-04-11 18:04 UTC (permalink / raw)
  To: Jacek Danecki; +Cc: linux-scsi, Tejun Heo, linux-ide, Jeff Garzik

On Wed, Apr 11, 2012 at 4:39 AM, Jacek Danecki <Jacek.Danecki@intel.com> wrote:
> On 04/11/12 04:13, Dan Williams wrote:
>>
>> Ping?  Ack on the ata changes?
>
>
> Acked-by: Jacek Danecki <jacek.danecki@intel.com>

Thanks Jacek, I was angling for Tejun or Jeff to weigh in with an ack
so I can take this through James' tree.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-04-11 18:04 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-11  4:38 [libsas PATCH v10 0/9] libsas error handling + discovery v10 Dan Williams
2012-03-11  4:39 ` [libsas PATCH v10 1/9] libsas: introduce sas_work to fix sas_drain_work vs sas_queue_work Dan Williams
2012-03-11  4:39 ` [libsas PATCH v10 2/9] libsas: cleanup spurious calls to scsi_schedule_eh Dan Williams
2012-03-16 15:25   ` Dan Williams
2012-03-11  4:39 ` [libsas PATCH v10 3/9] libata, libsas: introduce sched_eh and end_eh port ops Dan Williams
2012-04-11  2:13   ` Dan Williams
2012-04-11 11:39     ` Jacek Danecki
2012-04-11 18:04       ` Dan Williams
2012-03-11  4:39 ` [libsas PATCH v10 4/9] libsas: enforce eh strategy handlers only in eh context Dan Williams
2012-03-11  4:39 ` [libsas PATCH v10 5/9] libsas: add sas_eh_abort_handler Dan Williams
2012-03-11  4:39 ` [libsas PATCH v10 6/9] libsas: use ->lldd_I_T_nexus_reset for ->eh_bus_reset_handler Dan Williams
2012-03-11  5:37   ` jack_wang
2012-03-11  4:39 ` [libsas PATCH v10 7/9] libsas: trim sas_task of slow path infrastructure Dan Williams
2012-03-11  4:39 ` [libsas PATCH v10 8/9] libsas: fix sas_find_bcast_phy() in the presence of 'vacant' phys Dan Williams
2012-03-11  4:39 ` [libsas PATCH v10 9/9] libsas: sas_rediscover_dev did not look at the SMP exec status Dan Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.